Closed kordano closed 7 years ago
Can you inspect the files in the folder before and after the assoc-in? There should be exactly one file for the key :a
containing the value 1
in fressian encoding, probably two bytes long or so. Also is there some error in the server-error buffer? konserve internally does not use the supervisor, so it can be used with plain core.async
. Maybe some error slipped in there (but it shouldn't)?
Also note that the current version is 0.4.8.
Btw. is your path valid? On Windows you have to use a drive letter, I think. It is still a bug that konserve doesn't throw an error then.
I ran into this issue on Windows 8 as well and did some digging; looks like File.renameTo
fails, leaving the store file's filename with a trailing '.new'. In the example above, if I manually rename the file before running (<?? S (k/get-in store [:a]))
, it properly returns 1
.
@oahner Thanks a lot. I have pushed a new version to master, can you try whether the Files.move()
fix works?
That was quick! Unfortunately no, Files.move
is also unable to move an open file on Windows. Looks like the FileOutputStream.close
call will need to happen first, but then we lose the last fsync.
Would closing, moving, re-opening, fsyncing and re-closing work? Or is the ATOMIC_MOVE
flag on Files/move
equivalent to fsync?
Probably. I will try that hopefully at the weekend. The code should be straightforward to hack though, if you can spare some time ;).
Well, I took a crack at it and, while the code is certainly straightforward, figuring out what APIs provide durability guarantees in a cross-platform way really is quite a headache.
As far as I can tell from the openjdk source, Files.move
eventually calls MoveFileEx
but there's no way to pass it the MOVEFILE_WRITE_THROUGH
flag that would make it flush to disk before returning. I haven't found another Java API that allows this but I suppose it could be done directly through JNI if there are no other alternatives.
Unfortunately, the above research uncovered a few issues with fsync, namely, that fsync'ing the file is not enough and the folder needs to be fsync'd as well (fact that konserve seems aware of since there's a commented out fsync to the folder in the update-in
implementation).
I'm still trying to make sense of this but I may not be able to continue until next week, so here's a few interesting links for future reference (fair warning, the rabbit hole is deep):
Yes, I haven't come to do the fsync properly for the directory yet, as you mention it is far from being clear what you need to do to properly ensure durability. Yes the filesystem rabbit hole is deep and I think they are a mess as an interface.
I have no experience with JNI yet and don't have a windows machine, but @kordano might be able to check. @oahner If you have ideas beyond file moving for atomicity, then we could also change the implementation.
@oahner And thanks for diving into this! I hope we can nail it down somehow.
Will add this back in in case it is actually relevant: I attempted to use this from a Ubuntu VM, which was accessing the Windows Host's drive through a mount. I thought that was clever, but I had exactly the same issue with the .new
not being removed and get-in
returning nil
.
I'm unable to test an NTFS drive directly from Linux without being in a VM at the moment, so I'm not sure if this means it's an NTFS issue or still just a Windows issue (virtualbox being a Windows software after all). The specificity of hardware interactions from within a VM is a bit beyond me but I thought I'd let you know. Perhaps someone can test with an NTFS partition?
@oahner @eslachance I have added fsync'ing for the directory and close the file before moving now. @kordano has verified that it works on Windows now and it is as 0.4.10
on clojars. Please point out any remaining issues. The file-store also does non-blocking reading now finally, so you also get more efficiency :).
I'll close the issue for now. Please open a new one, if you are still affected!
Currently testing on Windows 10 a simple fs-store using
[io.replikativ/konserve "0.4.6"]
: