Alternative interface to produce many events with varying initial states

HDembinski commented 1 year ago

This is an idea that arose out of a lunch discussion with @mohller today, who is using Chromo in CRPROPA, and from the discussion around https://github.com/impy-project/chromo/pull/174, and it is also related to #65

The current Chromo API is designed around the use-case where you generate lots of a events with a fixed inital state (same particles at same ecm). However, there are also use cases where the initial state varies a lot, for example, when you simulate interactions in an air shower:

for beam_energy, beam_particle, target_particle in some_state_stack:
     model = Sibyll23d(FixedTarget(beam_energy, beam_particle, target_particle))  # A
     event = next(model(1)): # B
     # process event, C

Running Chromo in this use-case is still efficient if the time spend in the model the generate and process one event (B, C) is large compared to the setup time of the model (A). This likely won't be the case, however, if beam_energy is small.

We could technically speed up Chromo for this use-case with an alternative API that pushes this for loop into C or Fortran. It is not a simple change, but it is technically possible.

The call would then look like this:

events = Sibyll23d.process(some_state_stack)

where some_state_stack is a numpy array representating a table with the input state beam_energy, pid1, pid2 (we could add a flag to also support inputs in the center-of-mass frame), and events is a awkward array that contains the event data for each input state. We need the awkward array data structure here because we have a variable number of outgoing particles per input particle and we want to keep the association between input and output particles. Awkward arrays are able to represent such a mapping, while normal numpy arrays cannot.

The potential speed-up here is two-fold. 1) We are pushing the for loop in the first code example to fast Fortran code. This means we don't pay a Python call overhead each time that the model setup is changed. 2) We provide the output in form of an efficient data structure that allows us to pass all events at once to next processing step. Awkward arrays are arbitrarily flexible, but specialise to a very simple data structure in this case in which the events are essentially packed into a contiguous table, while a second internal array tracks the indices when one event ends and the next event starts. This data structure is still almost a numpy array and can be processed very efficiently, for example in numba. The simpler alternative of returning a Python list of numpy arrays would be inefficient, because the memory is then not contiguous.

CC @jpivarski

afedynitch commented 1 year ago

Cool. We also have a similar use case in a hybrid air shower Monte Carlo. @jncots has spent some time studying the performance profile of chromo in this type of application and may have some educated comments on this.

HDembinski commented 1 year ago

Oh that would be very useful, because my judgement is based on conjecture.

jncots commented 1 year ago

The problem with performance, as @HDembinski noticed, occurs at initiating and copy of kinematics class. The time spent in different functions can be seen on figures in issue https://github.com/impy-project/chromo/issues/180. As can be seen from there, most of the performance drain occurs in _composite_plan due to deep copy of kinematics objects.

Another alternative to Sibyll23d.process(some_state_stack) to just implement kinematics in C++ or Fortran, like kinematics(some_state_stack) which contains stack of prepared variables (pdgs, energies) for the generator. It would be a little bit slower because of Python loop through the stack, but it would not require akward arrays.

Actually, as it is seen from 2nd figure in https://github.com/impy-project/chromo/issues/180, adding beam particles to the stack in Python takes a lot of time. It is a serious problem in terms of performance. It could be fixed by writing C++/Fortran code that deals with the task.

HDembinski commented 1 year ago

Another alternative to Sibyll23d.process(some_state_stack) to just implement kinematics in C++ or Fortran, like kinematics(some_state_stack) which contains stack of prepared variables (pdgs, energies) for the generator. It would be a little bit slower because of Python loop through the stack, but it would not require akward arrays.

I think there is a misunderstanding here. You don't need awkward arrays for the input side, but for the output side. Because the output is not a single event now, but multiple events concatenated. The awkward arrays make that concatenation efficient (concatenating numpy arrays would be slow).

HDembinski commented 1 year ago

Implementing stuff in C++ or Fortran should be our last option always, because that is a serious maintenance overhead.

Actually, as it is seen from 2nd figure in https://github.com/impy-project/chromo/issues/180, adding beam particles to the stack in Python takes a lot of time. It is a serious problem in terms of performance. It could be fixed by writing C++/Fortran code that deals with the task.

I already said when we implemented that feature that it would be more efficient to generate all beam configurations first, then sort them, so that identical beam configurations are consecutive. With such a sorted input stack, one has to switch initial state in the generator much less frequently. This not only helps us on the Python side, but also helps the model to minimize the cost from switching beam configurations. For some models, doing that is costly.

impy-project / chromo

Alternative interface to produce many events with varying initial states #188