openmc-dev / openmc

OpenMC Monte Carlo Code
https://docs.openmc.org
Other
699 stars 444 forks source link

Allow getting cells, surfaces, materials by ID in constant-ish time #2953

Open MicahGale opened 3 weeks ago

MicahGale commented 3 weeks ago

Description

This problem came about because I am making an openmc model wrapper that wraps an existing model allows easily adding things to it and then running. This relies a lot on loading a lot of surface/cell/material numbers from a yaml file and then creating a lot of pointers to the corresponding objects in the wrapper to allow the model to be quickly modified. Initially this was making hundreds of calls to model.geometry.get_all_cells() and was very slow. So I changed to making this call once and then using that dict repeatedly. However I don't like this solution much due to mutation.

model = openmc.model.Model.from_xml(...)
cells = model.geometry.get_all_cells()
cell = openmc.Cell(1000)
model.geometry.root_universe.add_cell(cell)
cells[1000]

This should lead to a KeyError (haven't verified yet but assuming so).

I dealt with problem for MontePy, and the problem you run into is that cell numbers can mutate outside of the dictionaries control and leads to various mutation issues.

We solved this with our own custom collections that have an internal dictionary cache that isn't trusted

The algorithm used to find an object by number/id quickly is to have an internal __num_cache.

  1. As items are added they are added to the cache:
    def append(self, item):
    self._items.append(item)
    self.__num_cache[item.id] = item
  2. When an item is requested the item from the dictionary with that id. A KeyError is treated as a cache miss.
  3. The cell.id is verified as being the id requested. If it doesn't match this is treated as a cache miss
  4. In case of a cache miss an O(N) search is done over all items, and the entire cache is updated along the way.
  5. The cache is updated anytime there is reason to iterate over the whole collections.

proprosal

  1. Add a new wrapper collection for all numbered objects
  2. introduce to it a __getitem__ and get method with the above caching system
  3. Keep the get_all_cells methods
  4. Alternatives

    Status quo, with user caching misses and paranoia.

In my implementation right after full initialization I make sure to del cells because I know it cannot be trusted anymore.

Compatibility

This would change some data types under the hood which could break some more fancy user scripts.

Further design work is likely needed to determine the full extent of the scope of the possible API changes.

MicahGale commented 3 weeks ago

@pshriwise this is what I was thinking of for statepoint.tallies in #2671.