Open jtramm opened 7 months ago
Hello! This is Michel Saliba from EPFL. We will be working on this issue as of now. If you are interested in collaborating, reach out via michel.saliba@epfl.ch
This sounds like PTRAC? A comment about this development; I am a big fan of the MCPL format, it allows very nice cross-compatibility/portability between Monte Carlos. In my experience the file size is the same as an HDF5 file. Would be great if it could output to MCPL. I think the MCPL format spans the needed site info (maybe not the specific particle ID from a unique calculation?)
The MCPL format does sound great for this use case. As @yrrepy said, the only tricky bit seems to be the particle ID. MCPL appears to have an arbitrary user flag field for each particle, but it's limited to 32-bits. For larger simulations it's easy to imagine using more than ~4B total particles, so probably can't take advantage of this field for this use. The MCPL format does allow for arbitrary binary data to be appended to the file, so this could contain particle IDs, but it's unclear how hard it is to access/use this data with all the MCPL tools.
You can store whatever binary data (or blobs) you want in the header of an MCPL-file, and it is rather easy to retrieve, in particular if you also store some metadata about the blob next to it. So that's certainly an option. As such the format allows for up to ~10+^21 number of particles, but as you say there's only room for 32B userflags which is allowed to be anything. A dirty hack would perhaps be to pack some data into the polarization vector, which is not used by OpenMC anyway, but I really don't like that idea. I think the best option is to to somehow map the particle ID to the particle number in the file.
Yes, haha, I had a similar idea about re-using the polarization vector, but agree it seems janky.
One idea that comes to mind is that at the end of the simulation, we can sort the internal OpenMC representation of the sites by particle ID. As one intended use case is to collect sites from some small user tally of phase space, likely only a small % of total particle IDs will score. However, once sorted by ID, we can perhaps reindex the particle IDs such that the sites go from 4, 4, 99, 99, 1023, 1023, 1023 etc, to something like 0, 0, 1, 1, 2, 2, 2 ... where the IDs are changed to be contiguous, and then output the reindexed data to file. Once we use over 2^32 IDs we can just start back at 0, with the idea that a user may need to process the indexing data on their end manually if there are more than 2^32 IDs. As my guess is most use cases will involve less than 4 billion unique particles scoring, this will make it as easy as possible to use the data. For use cases with more particles scoring, then it's not too hard I think for users to write a few lines of python to loop through and reindex the data after pulling it out of the MCPL file.
Description
After some discussions at PHYSOR 2024 with Oskari Pakari of EPFL, the idea came about for a new score type that reports several fields of information whenever a particle has an event. In particular, the needed "site" info at each event is at least:
The main use case here is for detector work, where people want to have a bunch of sites within a detector to do a variety of other analyses with. Generating this type of site data is currently available in MCNP, but with the caveat that all events will generate site information (i.e., site generation cannot be restricted with any filtering). Given that people are often only interested in the portion of the geometry that is inside the detector, with MCNP this often in practice results in very slow generation of 100 GB++ sized output files which have to be post-processed down to the few MB that are actually needed. The workflow might also be facilitated in OpenMC via outputting of all particle tracks and then post processing manually, but this is even more impractical due to speed and file size issues.
Adding a score for this in OpenMC would be ideal, as it would allow the generation of these sites with filters. This would likely be much faster computationally and would greatly reduce the file I/O burden. My impression was that this type of analysis is done quite widely in the detector community, so would be of value to a significant number of people.
Implementation Ideas
There are a few complexities expected with the score's implementation. Generally it is quite different that other score types, as the number of bins is unbounded, and the information being generated is to be output raw without accumulating or applying statistics etc. Thus, it may takes some thoughts/brainstorming to come up with the optimal solution.
The first idea that comes to mind is to create a data structure that is outside of the normal tally bin arrays. For instance, it may be possible to have a vector of
EventSite
objects that gets appended to when tallying (with an openmp lock on the vector access), and then we just do an MPI gather at the end of simulation and add the combined sites array to the statepoint for processing in python. This is just the first idea though -- perhaps there is a simpler way of doing it.Alternatives
The only obvious alternative in OpenMC for this workflow is to output all particle tracks, but that is likely too expensive for most use cases (and would be even worse than users get now with MCNP). We could perhaps add the ability for the user to specify a filter to use to restrict generation of tracks, and to allow for combining particle tracks into a single file, but the introduction of a new score seems like a cleaner approach.
Compatibility
This is something new. It would likely result in small additions being made to the API and probably to the statepoint file format.