hannorein / rebound

💫 An open-source multi-purpose N-body code.
https://rebound.readthedocs.io/
GNU General Public License v3.0
861 stars 221 forks source link

file corrupted and only readable with rebound <3.17.4 #565

Closed Findus23 closed 3 years ago

Findus23 commented 3 years ago

I wish this was an easily reproducible issue, but unfortunately I don't know the exact source of the error.

I recently updated rebound to the latest version (3.17.4, more precisely the simulation log shows 8e66393c3eea393106429b70eb87468c7b0770c1 was used) and started a new large simulation that ran a few days. Unfortunately, in the last interval (when integrating from t=199990000.0 to t=200000000.0) my custom heartbeat function went weird and decided to delete all bodies due to them seemingly coming too close to the sun.

When I now try to analyse the result only one snapshot (of the 20000) can be read.

/home/lukas/.virtualenvs/rebound/lib/python3.9/site-packages/rebound/simulationarchive.py:99: RuntimeWarning: The binary file seems to be corrupted. An attempt has been made to read the uncorrupted parts of it.
  warnings.warn(message, RuntimeWarning)
Meta(initcon_file='initcon/conditions_many1.input', tmax=200000000.0, num_savesteps=20000, per_savestep=10000.0, initial_N=293, initial_N_planetesimal=250, initial_N_embryo=40, walltime=831794.3237063419, cputime=832083.567119166, current_time=200010000.0, hash_counter=400, git_hash='c1043b06131cf85678c0b4608cf4fe9954f4bdcb\n', rebound_hash='8e66393c3eea393106429b70eb87468c7b0770c1', massloss_method='rbf', no_merging=False)
1 Snapshots found

Before loosing hope and trying to cut off bytes at the end of the file, I checked with rebound 3.17.3 (as I saw that a lot of things in the handling of corrupted files changed afterwards) and weirdly enough the file is perfectly fine readable (without even a warning about corruption).

I unfortunately can't help much more, but maybe with the file you can find out what broke in 3.17.4 The file can be found at https://cloud.lw1.at/s/sKe32emzEaMGrnN and the code used to run the simulation is here (if it helps): https://git.lw1.at/lw1/rebound-collisions/-/blob/c1043b06131cf85678c0b4608cf4fe9954f4bdcb/water_sim.py

Findus23 commented 3 years ago

This might help narrow it down: I installed all rebound commits between the releases and https://github.com/hannorein/rebound/commit/1cbcfd5e72c0c1e271022ee3f982863c71f15b55 works, https://github.com/hannorein/rebound/commit/8020f5af91daef816e4d38cdce9c20e8fc9c5b8c works (despite the message :slightly_smiling_face:) and https://github.com/hannorein/rebound/commit/8e66393c3eea393106429b70eb87468c7b0770c1 does not work (as expected as it is the released version)

hannorein commented 3 years ago

Thanks. That is helpful. We tried to help the binary reading routines handle corrupt files better. It might have had the opposite effect in your case. I'll try to look into it when I find some time

Findus23 commented 3 years ago

Thanks. I'll just stay with 3.17.3 until then (as it should work the same in every other case)

hannorein commented 3 years ago

I think I have some idea where this is coming from. Still working on this (on the bintest branch). Thanks for the binary file. That was very helpful.

hannorein commented 3 years ago

I use an int16_t to store the offsets in the file. The issue is that your individual snapshots are bigger than 32k. Clearly using int16_t was a mistake on my part. I should have thought about this more, but I always tested it with a few particles and therefore never encountered the issue. I'll need to think how to fix this without breaking backwards compatibility.

hannorein commented 3 years ago

I finally found some time to work on this. The latest commit to the bintest branch increases the bit size. This should fix the problem. I call this "SimulationArchive Version 3" in the code. I've tried to maintain the ability to read and append old files ("SimulationArchive Version 2"). I will maintain this for the time being but it results in quite a bit of duplicate code, so I will probably remove any support for "version 2" in a future release.

Note: If any "version 2" file has an issue (the simulationarchive of a simulation with a large number of particles is corrupted), then it will still not work. All the data is in principle there, but I don't think this is worth implementing. However, if your life depends on being able to recover some old files, let me know 😉

hannorein commented 3 years ago

Merged. Hopefully this fixed the issue for everyone.

Findus23 commented 3 years ago

The only question that remains for me: Why does it work when I try to open the file with 3.17.3? Or does it open incorrect data? Or in other words: Will I encounter issues, if I continue to use 3.17.3 in the set of simulations I have recently started?

hannorein commented 3 years ago

Small number of particles:

Large number of particles:

Findus23 commented 3 years ago

Okay, the files created with 3.17.3 (as the one linked above) seems to work fine when read with 3.17.3, but as kind of expected still doesn't work with 3.18.0. I assume if I would re-run the simulations with 3.18.0 then it would be readable fine (but I want to avoid this, so I'll stay with the old version as I am mostly finished with the project)

hannorein commented 3 years ago

You're right. My previous comment was not accurate. Updated it.