jimpil / duratom

A durable atom type for Clojure
Eclipse Public License 1.0
213 stars 14 forks source link

Data loss for file atom after atom hits at least 620kb #18

Closed Vaelatern closed 4 years ago

Vaelatern commented 4 years ago

Yes, I know it's big. I have a big hash map. I'm moving to a database backend, it's probably time, but it's worth knowing this is an issue.

I'd love to help debug to fix this if I can.

Vaelatern commented 4 years ago

Further edits: I was on 0.4.1, using the file backend, and have no fewer than 4 separate file-backed duratoms in the same app. My users are now on 0.5.1 while I figure out this migration.

jimpil commented 4 years ago

Hi there,

This shouldn't happen! Can you confirm which version you're seeing this under, and perhaps provide some example code? Thanks in advance...

Kind regards

Vaelatern commented 4 years ago

The duratom that lost state was configured :local-file :file-path "./..." :init ....

jimpil commented 4 years ago

Oh wow! We literally pressed comment at the same time!!!

Ok, so there was a time when util/pr-str-fully wasn't fully printing, but it should definitely do now...Are you seeing the same data-loss with 0.5.1?

Vaelatern commented 4 years ago

I'm going to find out. Would that function have truncated the print or not printed all values in the atom?

jimpil commented 4 years ago

It would have truncated, due to *print-length* and *print-level*. See here: https://github.com/jimpil/duratom/blob/816362836452a181c84f2b248d8133092e2e6111/src/clojure/duratom/utils.clj#L41

Vaelatern commented 4 years ago

Ok, this was not the corruption I was seeing. I saw user data seemingly never written to disk, for a restart to then change their passwords back to one from their past.

jimpil commented 4 years ago

I don't fully follow...are you saying that duratom can't write to a file at all, beyond a particular size?

Vaelatern commented 4 years ago

I'm saying that given my large state, yes, Duratom seemed to stop writing to the file.

jimpil commented 4 years ago

Are you by any chance using a custom :rw map?

Vaelatern commented 4 years ago

I am not

jimpil commented 4 years ago

Ok, this is bizarre - 620kb is by no means a large file... I'll try to reproduce later tonight when I come back from work. In the meantime, would it be easy for you to try your use-case against a different storage-backend (e.g. postgres)? That will provide a good indication about where the problem lies (i.e. the duratom object itself VS the backend object).

Many thanks in advance...

jimpil commented 4 years ago

Something else that could be happening is swallowed exceptions on the agent's dispatch thread (for async commits). There was no way to specify an :error-handler on 0.4.1, but there is on 0.5.1, and I would advise you to use it for (at least) logging the errors, and potentially re-commiting (second arg on the error-handler).

Vaelatern commented 4 years ago

I don't seem to be able to set :error-handler without setting a custom reader or writer.

Vaelatern commented 4 years ago

I'll be working on this in many hours, thank you for your help.

jimpil commented 4 years ago
(assoc default-file-rw 
  :error-handler 
  (fn ([agent exception] ...) ;; typical agent errror-handler
      ([agent exception recommit!] ...))) ;; persistence error-handler
Vaelatern commented 4 years ago
Caused by: java.lang.NullPointerException
        at duratom.backends.FileBackend.snapshot(backends.clj:68)
        ... 110 more

Now duratom can't start, same as before

harold commented 4 years ago

I appreciate both your efforts here, and look forward to discovering the root cause of this. We have gigabytes of duratoms and have never encountered what you're hitting here.

Good luck, and keep it up until you find it.

Thanks!

jimpil commented 4 years ago

Some more stack-traces would be helpful...I will try this in a few hours anyway.

Vaelatern commented 4 years ago

That's the only stack trace I have, and it's of not having a successful :rw map

Vaelatern commented 4 years ago

Ok, now able to log my attempts. Will see what happens!

jimpil commented 4 years ago

Any news on this? I wasn't able to reproduce any data loss using a 12MB file...

Vaelatern commented 4 years ago

Hanging out waiting for it to fail, it hasn't yet. I don't know when it happens, but I have logging set up to catch exceptions. And I'm glad I'm on the latest version now.

jimpil commented 4 years ago

I think we've established that persisting values greater than 620kb (or a lot larger for that matter) does work as expected. Can I close this? If you happen to find some further problem, it will most likely be unrelated to this, so it will warrant its own ticket.