Closed Vaelatern closed 4 years ago
Further edits: I was on 0.4.1, using the file backend, and have no fewer than 4 separate file-backed duratoms in the same app. My users are now on 0.5.1 while I figure out this migration.
Hi there,
This shouldn't happen! Can you confirm which version you're seeing this under, and perhaps provide some example code? Thanks in advance...
Kind regards
The duratom that lost state was configured
:local-file :file-path "./..." :init ....
Oh wow! We literally pressed comment
at the same time!!!
Ok, so there was a time when util/pr-str-fully
wasn't fully printing, but it should definitely do now...Are you seeing the same data-loss with 0.5.1?
I'm going to find out. Would that function have truncated the print or not printed all values in the atom?
It would have truncated, due to *print-length*
and *print-level*
. See here:
https://github.com/jimpil/duratom/blob/816362836452a181c84f2b248d8133092e2e6111/src/clojure/duratom/utils.clj#L41
Ok, this was not the corruption I was seeing. I saw user data seemingly never written to disk, for a restart to then change their passwords back to one from their past.
I don't fully follow...are you saying that duratom can't write to a file at all, beyond a particular size?
I'm saying that given my large state, yes, Duratom seemed to stop writing to the file.
Are you by any chance using a custom :rw
map?
I am not
Ok, this is bizarre - 620kb is by no means a large file... I'll try to reproduce later tonight when I come back from work. In the meantime, would it be easy for you to try your use-case against a different storage-backend (e.g. postgres)? That will provide a good indication about where the problem lies (i.e. the duratom object itself VS the backend object).
Many thanks in advance...
Something else that could be happening is swallowed exceptions on the agent's dispatch thread (for async commits). There was no way to specify an :error-handler
on 0.4.1, but there is on 0.5.1, and I would advise you to use it for (at least) logging the errors, and potentially re-commiting (second arg on the error-handler).
I don't seem to be able to set :error-handler without setting a custom reader or writer.
I'll be working on this in many hours, thank you for your help.
(assoc default-file-rw
:error-handler
(fn ([agent exception] ...) ;; typical agent errror-handler
([agent exception recommit!] ...))) ;; persistence error-handler
Caused by: java.lang.NullPointerException
at duratom.backends.FileBackend.snapshot(backends.clj:68)
... 110 more
Now duratom can't start, same as before
I appreciate both your efforts here, and look forward to discovering the root cause of this. We have gigabytes of duratoms and have never encountered what you're hitting here.
Good luck, and keep it up until you find it.
Thanks!
Some more stack-traces would be helpful...I will try this in a few hours anyway.
That's the only stack trace I have, and it's of not having a successful :rw map
Ok, now able to log my attempts. Will see what happens!
Any news on this? I wasn't able to reproduce any data loss using a 12MB file...
Hanging out waiting for it to fail, it hasn't yet. I don't know when it happens, but I have logging set up to catch exceptions. And I'm glad I'm on the latest version now.
I think we've established that persisting values greater than 620kb (or a lot larger for that matter) does work as expected. Can I close this? If you happen to find some further problem, it will most likely be unrelated to this, so it will warrant its own ticket.
Yes, I know it's big. I have a big hash map. I'm moving to a database backend, it's probably time, but it's worth knowing this is an issue.
I'd love to help debug to fix this if I can.