impy-project / chromo

Hadronic Interaction Model interface in PYthon
Other
30 stars 7 forks source link

Restartable models #111

Open HDembinski opened 1 year ago

HDembinski commented 1 year ago

An attempt to close #106

This branch is based on #112, which should be merged first.

This is using two approaches. For most models, it is sufficient run some initialization code exactly once, which I ensure with the new _once private method. Concrete models should not implemented __init__ anymore, only _once which is called by MCRun.__init__ exactly once and after the underlying implementation has been initialized.

For Phojet and DPMJet, this does not work (at least I could not get it to work), so I followed the idea expressed in #106. I wrote new base class MCRunRemote, which a brilliant and terrible piece of engineering. Behind the scenes, it runs every call to cross_section and __call__ through a remote process which is created and dies just for the call. The beauty of it: a model which requires this workaround just has to inherit from MCRunRemote instead of MCRun, no other change required!

To maintain the illusion of having a single model and not one that is restarted all the time, I rely strongly on the random-state serialization feature that @jncots added. I further improved this so that the PRNG state of the numpy generator is also saved. Since this so important, I added more tests that check the RNG state serialization and found some issues. @jncots could you please look into this?

Other changes

HDembinski commented 1 year ago

@afedynitch You said that you are looking into fixing cross-sections for the models. I added a new test test_cross_sections.py which checks the output of all models for various combinations of projectiles and targets. The references for such checks are very rough, of course, because the models seem to disagree a lot especially on nuclear cross-sections, however, the test still checks basic sanity of the model response and I found several bugs this way, too.

HDembinski commented 1 year ago

From my side, this PR is ready for review, I need your help to fix the remaining things.

Tests are failing largely because of the new tests for rng_state persistence that I added (that is unrelated to my changes).

HDembinski commented 1 year ago

At least one of the tests is stalling, on my own computer and on CI. I haven't figured out which one yet. This means you have to use a keyboard interrupt to complete the tests at the moment. On Windows the tests abort with a memory error, these things could be related, although I did not notice a large increase in RAM on my computer. Parallel computation works differently on Windows and Unix, so it could well be that there is a mistake which Windows reveals more drastically.

Edit: Stalling is now fixed, the issue is well understood.

HDembinski commented 1 year ago

The test which runs forever is test_to_hepmc3.py, I am investigating.