meilisearch / heed

A fully typed LMDB wrapper with minimum overhead 🐦
https://docs.rs/heed
MIT License
569 stars 52 forks source link

[Stale] Compare identical environments by using a `same_file::Handle` #179

Open darnuria opened 1 year ago

darnuria commented 1 year ago

This PR implements this suggestion https://github.com/meilisearch/heed/issues/145#issuecomment-1384539350 and try to fix #180.

Stalled because the current implementation with same_file open files since windows needs it, but lmdb itself use posix locks with fnctl syscall on unixes.

We use the same_file::Handle struct that compares dev_t and ino_t instead of raw paths internally to make the comparison more accurate. On Windows, you should read how it works 😅.

EDITS: Edited to add the caveat found during implementing the idea.

Caveat of this PR

Tl;DR Keeping FileDescriptor open, may trouble flock/fcntl works inside lmdb and screw multi-process lock with lmdb.

From comment: https://github.com/meilisearch/heed/pull/179#discussion_r1258312633

Unix ideal: just check (dev_t, ino_t)

Unix way

stat kind of function clean nice no File Descriptor keeped. Note: that same file keep open a File Descriptor open!

https://github.com/BurntSushi/same-file/blob/master/src/unix.rs#L10

Windows

An other same-file rely on winapi32, involving Filehandles.

Lmdb limitation

For sure it may break the flock()ing (not the best system api..), and the multi-process features of lmdb is really handy.

Checked-out lmdb sources:

About flocking in lmdb.h the sentense you pointed changed with:

commit 77845345ca9bf3854fd9da60a3e3b0527fa9c76a
Author: Hallvard Furuseth <hallvard@openldap.org>
Date:   Tue Sep 27 07:03:42 2016 +0200

    ITS#8505 Clarify fork() caveat, mdb_env_get_fd(), flock->fcntl.

diff --git a/libraries/liblmdb/lmdb.h b/libraries/liblmdb/lmdb.h
index 30d5862..319fcf6 100644
--- a/libraries/liblmdb/lmdb.h
+++ b/libraries/liblmdb/lmdb.h
@@ -97,11 +97,12 @@
  *    transactions.  Each transaction belongs to one thread.  See below.
  *    The #MDB_NOTLS flag changes this for read-only transactions.
  *
- *  - Use an MDB_env* in the process which opened it, without fork()ing.
+ *  - Use an MDB_env* in the process which opened it, not after fork().
  *
  *  - Do not have open an LMDB database twice in the same process at
  *    the same time.  Not even from a plain open() call - close()ing it
- *    breaks flock() advisory locking.
+ *    breaks fcntl() advisory locking.  (It is OK to reopen it after
+ *    fork() - exec*(), since the lockfile has FD_CLOEXEC set.)
  *
  *  - Avoid long-lived transactions.  Read transactions prevent
  *    reuse of pages freed by newer write transactions, thus the
Kerollmops commented 1 year ago

Thank you @darnuria,

It looks like the same_file::Handle implements Eq + Hash, which makes it a good candidate for replacing the current PathBuf key in the OPENED_ENV hashmap. What do you think?

Kerollmops commented 1 year ago

I read somewhere that hard links on Windows require Administrator privileges...

darnuria commented 1 year ago

(not directly related to hardlink) Yeah windows CI had troble with theses two: https://github.com/meilisearch/heed/actions/runs/5508749200/jobs/10041079210?pr=179#step:4:140

Kerollmops commented 1 year ago

Do you know if we can run a certain pipeline in the CI as admin or Windows admin?

darnuria commented 1 year ago

Do you know if we can run a certain pipeline in the CI as admin or Windows admin?

Don't know, also discovering that windows way of doing is really complicated. :clown_face:, like moving dir or getting file information.

Kerollmops commented 1 year ago

Don't know, also discovering that windows way of doing is really complicated. 🤡, like moving dir or getting file information.

I would like to skip the tests that require Admin rights on Windows as I don't have time to make them work in the CI for now 😬 What do you think?