Closed ranlu closed 1 year ago
This is definitely very useful. Introducing the locks in the same folder like this though as default behavior is likely not backwards compatible though as some people will be depending on operations like listing files not being dirty. Others, like @supersergiy are depending on reads being as fast as possible. Creating new lock files can also create resource contention if the number of lock files are very large or can be problematic if the directory is read-only (I do see you have a try-catch for PermissionError though, so I guess it's only a performance consideration?)
I think several of these issues can can be pretty easily resolved by putting the locks somewhere in .cloudfiles
(maybe .cloudfiles/locks
?). The performance issue can be solved with a flag to enable or disable this behavior.
Thanks for the suggestion @william-silversmith, I added an environment variable CLOUD_FILES_LOCK_DIR
to store the lock in a unified location. It also serves as a global switch to turn on locking for local files: if it is unset, cloud-files will not try to lock the files before accessing them. I think adding a option of the interface will be more invasive, as we need to expose the option in all the packages using cloud-files as a backend as well.
To avoid name collision, I convert the abspath of the target file to the lock filename by replace '/' with '=+', hopefully few people name their files with those characters.
That approach is reasonable to me. Generally, I like to provide a non-environment variable interface as well that would override the environment variable on an instance basis. Would that be possible?
Maybe locking=None
(default, no locking, defer to env variable), locking=True
(disregard env variable and lock in the default location of CLOUD_FILES_DIR/locks
), locking=False
(disregard env variable, and don't lock)? Often programs will have differing requirements depending on the phase of the computation.
I added the locking option you suggested. I changed the behavior when locking=True
a little bit: The env CLOUD_FILES_LOCK_DIR
will be used for the locks and only in the case it is unset, CF fall back to CLOUD_FILES_DIR/locks
. Otherwise the user lose the ability to use a customized location when they explicitly want locking, which I feel is a rather weird behavior.
@william-silversmith, what is your suggestion to fix this test, should I add dummy locking options to S3, or a **kwargs
to match the other interfaces? The latter feels like belonging to another PR.
Another potential way to handle that is allow True to use the default (your idea makes sense) and if it's a string, use that as the location.
On Tue, Jul 4, 2023, 2:39 PM ranlu @.***> wrote:
I added the locking option you suggested. I changed the behavior when locking=True a little bit: The env CLOUD_FILES_LOCK_DIR will be used for the locks and only in the case it is unset, CF fall back to CLOUD_FILES_DIR/locks. Otherwise the user lose the ability to use a customized location when they explicitly want locking, which I feel is a rather weird behavior.
— Reply to this email directly, view it on GitHub https://github.com/seung-lab/cloud-files/pull/91#issuecomment-1620614688, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATGQSN7OT2BOXKKEN6SH3TXORPPXANCNFSM6AAAAAAZ5E33MY . You are receiving this because you were mentioned.Message ID: @.***>
Another potential way to handle that is allow True to use the default (your idea makes sense) and if it's a string, use that as the location.
If we want to extend the interface we probably should add another lock_dir
option, locking
representing a path does not sounds right.
I see what you're saying about the semantics of it. I like to keep the number of arguments as small as possible, but a second argument would be acceptable. I'll take a look at the test tomorrow.
I added a lock_dir option, the behavior now is:
lock_dir
or CLOUD_FILES_LOCK_DIR
is set. lock_dir
takes prioritylock_dir
or CLOUD_FILES_LOCK_DIR
is set, use CLOUD_FILES_DIR/locks
.The tests work now after #94
@william-silversmith, Thanks for the review, I factored out the locking code into a io_with_lock
function, also append the crc32 of the abspath to the filename to make it hopefully unique.
nice solution!
This mainly helps when using multiple cloud-volume instance with local cache, without locking the cache maybe read when incomplete. Use fastener to put a shared lock when reading files and an exclusive lock when writing files. The patch creates a
xxx.lock
file when cloud-files try to access filexxx
, in the same folder. If it does not have write permit, fallback to the no lock path, though I am not sure if it make sense to still try put files in that case. I did not try cleanup the files, since it is tricky to find out when all the locks are resolved. An alternative approach is to create lock files somewhere else, maybe some folder underCLOUD_FILES_DIR
Only tested the patch under linux.