impy-project / chromo

Hadronic Interaction Model interface in PYthon
Other
31 stars 7 forks source link

Run models in separate processes to solve multiple issues #106

Open HDembinski opened 1 year ago

HDembinski commented 1 year ago

One of the major annoyances when working with impy interactively, is that one cannot create several instances of most models without spawning processes. My plan for impy is to make it part of the API that a model is always started in a separate process. This solves a lot of issues.

Here, creating a model spawns a new process, and the user never notices. It is an implementation detail. Finally, the model acts like a normal Python class.

m1 = EposLHC()  # starts process 1 which initializes an independent instance of EposLHC
m2 = EposLHC()  # starts process 2 which initializes another independent instance
for event in m1(kin1, 10):
   ...
for event in m2(kin2, 10):
   ...

MCRun.__call__ communicates with the process and runs the event generation in the other process.

A caveat of this approach is that there are overheads in process communication which may slow impy down, and we want to avoid that (speed is something we care about). I assume that basic interprocess process communication (sending input data to the other process, waiting for its output) has a negligible impact for most models, but I have to see. For SIBYLL, which is incredibly fast, it may be noticable.

The main overhead of pickling and unpickling an event object can be avoided by using shared memory. Shared memory allows two processes to read/write to the same section of memory, so that data can be transferred without any overhead.

Implementing this requires some changes to EventData, which will be allocated in shared memory, and the models, because they need to write to EventData allocated in Python instead of their own Fortran buffers wrapped by MCEvent. Most models call a Fortran routine anyway to copy data from their internal buffers into a HepEvt like buffer. This should be replaced by new routines that copy the data into the EventData object instead. This can probably be implemented with Numba, if the original buffers are readable from Python, requiring no new Fortran code.

Since we run the models in another process, we may even be able to solve the issue that most models produce way too much output, since we should be able to redirect stdout of the other process into some buffers that we control from the main thread.

HDembinski commented 1 year ago

If the interprocess communication overheads are too large, then the alternative is to make all models singletons, so that the initialization code is only run once. It also allows us to "create" multiple instances of the same model, which are then just the same instance. This effectively also solves most issues, except for the stdout issue, that would have to be solved differently.