Issue with normalization using external sources (feature request / bug report)

marquezj commented 9 months ago

This is something we have been discussing with @zoeprieto, @inti-abbate and @norberto-schmidt. Maybe @ebknudsen has something to comment too.

In the current implementation of file sources (both in HDF5 or MCPL), source sites are saved from the source, and can be used for resampling: the particle list is read from the file and then sampled up to the number set in settings. This is fine, for instance, to save a converged file in a criticality calculation and reuse it as seed in a new calculation to reduce initialization time.

But there is another use case for external sources, which is the main reason for MCPL or KDSource to exist: reuse dumped particles to couple different simulations. This could be used to connect to Monte Carlo codes (a cosmic ray shower simulated in Geant4 or PHITS is used as a neutron source in OpenMC, or a MHD simulation of a fusion reactor drives the neutrons in an external source in OpenMC), or as a way to to variance reduction (if particles can be approximated to go in one direction, the simulation upstream can be done once and then make changes in the simulation downstream). In these cases particles are usually dumped in a boundary, and particles in general have different weights. The particle list needs to be read sequentially instead of being resampled, to preserve the normalization of the original simulation.

One possible solution is to a option to run using the normalization of the external source. In that case, the source should be read at the beginning of the simulation, and the number of particles and total intensity are used for normalization. Another possibility would be to accumulate the total weight as the particles are produced, and when the list is done throw an exception to reduce the tallies.

I would like to discuss here if there is an argument against this feature being introduced in OpenMC, and ideas on the least disruptive way to do it. Also, I think this discussion should exist somewhere to let the users know they should not expect the same functionality as SSW/SSR[^1] in MCNP or similar features in other Monte Carlo codes.

[^1]: MCNP 6.3 User Manual Sections 5.8.8 and 5.8.9.

marquezj commented 9 months ago

I prepared an implementation of sequential read of surface sources. It is available here. The main changes are: an index to read the source sequentially, a flag in the source to read sequentially or sample it, and the source strength for proper normalization.

This is a simple example of coupling PHITS with OpenMC using MCPL. Please note that, in order to get proper normalization, the user needs to set by hand the source strength to the ratio of particles stored in the MCPL file to the number of particles originally simulated upstream. This is needed because this number is not stored in the dump file by PHITS.

gonuke commented 8 months ago

Following an in-person discussion with @pshriwise and @micahgale we determined that it should always be read sequentially. Resampling with replacement as is done in the current implementation is always dangerous and adds no value. All use cases risk there being non-unit weights for particles, so care is necessary in managing the normalization.

Additional weight management is necessary if sampling a non-integer multiple of the number of histories of in the SSR file, so there are additional considerations about under- or over-sampling. What seems most logical is to renormalize all results by dividing by the ratio of the total weight that was ultimately sampled in an SSR simulation to the total weight that exists in the SSW file. There are different strategies for this, and care must be taken to be robust to the use of tally triggers.

Since tally triggers are only evaluated between batches, the total weight sampled for each batch can be calculated for each batch and used for normalization at the end. In addition, it should be considered a fair game to sequentially use the source sites for each batch starting where the previous batch finished, looping over the total set of sites as necessary.

It is unclear how to make this both safe and efficient with multi-threading - stay tuned.

pshriwise commented 8 months ago

A true attempt at this will tell more, but a possible scheme for particle sourcing without locking:

The FileSource (or an extension of FileSource) will hold a periodic index/iterator (one that will wrap to the first particle automatically) to the first particle to be sourced for a given batch of particles. Based on the number of particles in the current batch, the weighting factor can be computed from the starting index. When the next batch is initialized the index/iterator and weighting factor will be advanced and updated, respectively.

Based on that starting location in the set of particles provided by the FileSource, I think particle IDs can then be used to determine the correct particle to source from the FileSource instance without requiring a lock as it would require read-only operations on the class instance.

ebknudsen commented 8 months ago

From left field we've been thinking about similar things (filtering of source particles). I think @pshriwise 's concept could work - at least in terms of the iterator idea. Wrt. normalization - wouldn't updating the weighting factor batchwise, be "off by one" inherently? If I read the idea correctly, one would end up normalizing the next batch based on what happened in the previous batch. In the limit of many batches this would converge to the correct value, but for a few batches you'd get bias, wouldn't you? In terms of tallies, I suppose one could trigger a tally renormalization after a batch has been run and tallied. If however OpenMC is used to generate particles for some other use this would also have to be done in the surface write level. At least I think so.

gonuke commented 8 months ago

The intention was to normalize batch B with the accumulated sum of source weight in batch B, at the end of that batch.

gridley commented 8 months ago

Resampling with replacement as is done in the current implementation is always dangerous and adds no value.

I do not agree with this. How is it dangerous? Maybe there's a vanishingly small normalization error, but to call it dangerous?

The value it adds is that it's straightforward to parallelize. Without seeing some concrete numbers on the variance or mean error over independent simulations exhibited by each method, I do not believe this.

Similarly

In these cases particles are usually dumped in a boundary, and particles in general have different weights. The particle list needs to be read sequentially instead of being resampled, to preserve the normalization of the original simulation.

Again I really am not seeing how this could interfere with normalization. I suppose for a small number of samples, yes, the average simulation will have an nonzero expected error in the normalization. And in the case the particles have huge variance in weights, the expected normalization error would be larger. But that means you're not using a very good variance reduction method. Any method that adjusts particle weights takes care of splitting and rouletting to prevent this very situation.

It seems to me that even with a modest number of particle histories, say 1e6, the normalization is nearly exact, although not perfect on average, true. The MC noise would outweigh any normalization error. The statistical bias is zero with the current scheme, for sure.

If my claim that the bias is in fact zero stands, while perhaps having a small mean error (which I claim is small compared to other sources of noise), revising the source sampling methodology adds unnecessary complexity to the code.

openmc-dev / openmc

Issue with normalization using external sources (feature request / bug report) #2870