Restartable models - Githubissues

HDembinski commented 1 year ago

An attempt to close #106

This branch is based on #112, which should be merged first.

This is using two approaches. For most models, it is sufficient run some initialization code exactly once, which I ensure with the new _once private method. Concrete models should not implemented __init__ anymore, only _once which is called by MCRun.__init__ exactly once and after the underlying implementation has been initialized.

For Phojet and DPMJet, this does not work (at least I could not get it to work), so I followed the idea expressed in #106. I wrote new base class MCRunRemote, which a brilliant and terrible piece of engineering. Behind the scenes, it runs every call to cross_section and __call__ through a remote process which is created and dies just for the call. The beauty of it: a model which requires this workaround just has to inherit from MCRunRemote instead of MCRun, no other change required!

To maintain the illusion of having a single model and not one that is restarted all the time, I rely strongly on the random-state serialization feature that @jncots added. I further improved this so that the PRNG state of the numpy generator is also saved. Since this so important, I added more tests that check the RNG state serialization and found some issues. @jncots could you please look into this?

Other changes

MCRun.get_stable returns set of particles that were changed to stable
Kinematics is now passed to MCRun.__call__ instead of MCRun.__init__; this makes more sense for models that are restartable. Some models need a maximum energy upfront, this can be passed via init.
MCRun.kinematics property was removed. The kinematics should not be part of the state of the generator, we pass them explicitly to the two functions which need them, MCRun.__call__ and MCRun.cross_section. Less class state is generally better, it simplifies the design and the reasoning about something. It does not matter that the underlying Fortran stores the kinematics, because we abstracted that away in our high-level API.
MCRun.random_state now includes the state of the numpy PRNG
MCRun._composite_plan now generates heavy elements first, as a workaround for DPMJet
MCRun.set_unstable was renamed to MCRun.maydecay

HDembinski commented 1 year ago

@afedynitch You said that you are looking into fixing cross-sections for the models. I added a new test test_cross_sections.py which checks the output of all models for various combinations of projectiles and targets. The references for such checks are very rough, of course, because the models seem to disagree a lot especially on nuclear cross-sections, however, the test still checks basic sanity of the model response and I found several bugs this way, too.

HDembinski commented 1 year ago

From my side, this PR is ready for review, I need your help to fix the remaining things.

Tests are failing largely because of the new tests for rng_state persistence that I added (that is unrelated to my changes).

HDembinski commented 1 year ago

At least one of the tests is stalling, on my own computer and on CI. I haven't figured out which one yet. This means you have to use a keyboard interrupt to complete the tests at the moment. On Windows the tests abort with a memory error, these things could be related, although I did not notice a large increase in RAM on my computer. Parallel computation works differently on Windows and Unix, so it could well be that there is a mistake which Windows reveals more drastically.

Edit: Stalling is now fixed, the issue is well understood.

HDembinski commented 1 year ago

The test which runs forever is test_to_hepmc3.py, I am investigating.

impy-project / chromo

Restartable models #111

Other changes