HDF5 dataset name collision in mgxs_library.py preventing more temperature data points than number of temperatures in min-max range

lewisgross1296 commented 1 year ago

In #2296, I provided an MWE that shows multi-group mode (settings.energy_mode=multi-group) is currently incompatible with settings.temperature[method]=interpolation. For multiphysics simulations (e.g. Cardinal) that need multi-group mode, we are then are restricted to using settings.temperature[method]=nearest. To remedy this, I wanted to use a large number of temperatures in the data library so that nearest mode essentially becomes a "quasi-interpolation." However, I ran into an issue with HDF5 when I attempted to set more temperatures in the data library than there were integer numbers between my min and max. Below is an MWE that causes an error:

import openmc
import openmc.mgxs as mgxs
import numpy as np
import h5py

T_min = 308 # K
T_max = 358 # K
NT = 147 # number of temps. picked so that DT is an irrational number
temps = np.linspace(T_min,T_max,NT)

# generate one group cross section data
groups = mgxs.EnergyGroups()
groups.group_edges = np.array([0.0, 20.0e6]) #  groups in eV

# isotropic angular data
xsdata = openmc.XSdata('slab_xs', energy_groups=groups, temperatures=temps, num_delayed_groups=0)
xsdata.order = 0

Sig_t0 = 3 # total XS at reference temp
for T in temps:
    xsdata.set_total(np.array([Sig_t0/T]),temperature =T)
    xsdata.set_absorption(np.array([Sig_t0/(2*T)]),temperature =T)
    xsdata.set_scatter_matrix(np.array([[[Sig_t0/(2*T)]]]),temperature =T)

# export xsdata
one_g_XS_file = openmc.MGXSLibrary(groups) # initialize the library
one_g_XS_file.add_xsdata(xsdata) # add benchmark XS data
one_g_XS_file.export_to_hdf5('one_gxs.h5') # write to disk

The error:

Traceback (most recent call last):
  File "/home/lgross/1Dslab_multiphysic_analytical_benchmark/openmc_hdf5_temp_issue_MWE.py", line 26, in <module>
    one_g_XS_file.export_to_hdf5('one_gxs.h5') # write to disk
  File "/home/lgross/.pyenv/versions/3.10.5/lib/python3.10/site-packages/openmc/mgxs_library.py", line 2527, in export_to_hdf5
    xsdata.to_hdf5(file)
  File "/home/lgross/.pyenv/versions/3.10.5/lib/python3.10/site-packages/openmc/mgxs_library.py", line 1986, in to_hdf5
    ktg.create_dataset(temp_label, data=kT)
  File "/home/lgross/.pyenv/versions/3.10.5/lib/python3.10/site-packages/h5py/_hl/group.py", line 161, in create_dataset
    dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
  File "/home/lgross/.pyenv/versions/3.10.5/lib/python3.10/site-packages/h5py/_hl/dataset.py", line 156, in make_new_dset
    dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl, dapl=dapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5d.pyx", line 87, in h5py.h5d.create
ValueError: Unable to create dataset (name already exists)

There was some discussion in this slack thread. @pshriwise did some digging and found that this line in mgxs_library.py, among some other places, causes the above.

He suggested a work-around: use a float (with a few places) instead of an int when converting the temperature name to a string. I plan to test this tomorrow to see if it succeeds, but I think, in general, a user should be allowed to have more temperatures in their library than the number of integers between their min and max temp. We are unsure if anywhere else in the code assumes an int format for the name, so I will report back once I try this work around to see if I can create the library / run a simulation.

lewisgross1296 commented 1 year ago

So I was at least able to recompile / create an h5 library by changing the two occurrences of str(int(np.round(temperature))) in lines 1982 -1990 of mgxs_library.py to the following :

        for temperature in self.temperatures:
            temp_label = str(np.round(float(temperature),5)) + "K"
            print(temp_label)
            kT = temperature * openmc.data.K_BOLTZMANN
            ktg.create_dataset(temp_label, data=kT)

        # Create the temperature datasets
        for i, temperature in enumerate(self.temperatures):

            xs_grp = grp.create_group(str(np.round(float(temperature),5)) + "K")

Inspecting the library with h5ls seems to indicate success (only showing the first two temp results)

lgross@ulam:~  $ h5ls -r one_gxs.h5 
/                        Group
/slab_xs                 Group
/slab_xs/308.0K          Group
/slab_xs/308.0K/absorption Dataset {1}
/slab_xs/308.0K/scatter_data Group
/slab_xs/308.0K/scatter_data/g_max Dataset {1}
/slab_xs/308.0K/scatter_data/g_min Dataset {1}
/slab_xs/308.0K/scatter_data/scatter_matrix Dataset {1}
/slab_xs/308.0K/total    Dataset {1}
/slab_xs/308.34247K      Group
/slab_xs/308.34247K/absorption Dataset {1}
/slab_xs/308.34247K/scatter_data Group
/slab_xs/308.34247K/scatter_data/g_max Dataset {1}
/slab_xs/308.34247K/scatter_data/g_min Dataset {1}
/slab_xs/308.34247K/scatter_data/scatter_matrix Dataset {1}
/slab_xs/308.34247K/total Dataset {1}

My next task will be to run a simulation using this library. Instead of using the above, I want to attempt to use one a more meaningful to my application. Likely will give it a go, either over the weekend or next week.

If this seems to work and people are generally okay with my choice of a float with five decimals, then this could be a quick two-line PR. Though maybe we'd want to add some tests as well?

paulromano commented 1 year ago

A few thoughts:

Extending to a float with a few decimal places seems reasonable. However, make sure you don't break existing libraries.
Although I understand the motivation, in what practical situation would the difference between cross sections that are less than 1 K apart ever matter?

lewisgross1296 commented 1 year ago

Although I understand the motivation, in what practical situation would the difference between cross sections that are less than 1 K apart ever matter?

I was also expecting that resolution smaller than 1K wouldn't make much of a difference. However, for this analytical benchmark I've been working on with @aprilnovak, @pshriwise, and @gonuke, we are experiencing error behavior that is not expected and are interested to see if refining the temperature library further helps.

Our geometry uses N=[50,100,250,500,1000] cells, but in each case the temperature library uses 50 temperatures. Each case also uses the same number of particles/batches. The flux's error norm is not decreasing monotonically as we refine the mesh.

Maybe the finer cases need more particles than the coarser ones, but I'm not sure how to scientifically decide how many particles to run in each case. While I do want to see if refining the temperature library further would help, some other options include

Run more active batches than before (use the same batching/particles in each case) and use all temperature libraries with the max allowed (50).
Keep the batching the same and intentionally use a coarser temperature library for the cases with fewer cells. Also not sure how to pick the number of temperatures for each case. For example, if I try to scale proportional to the number elements, then the smaller cases would only have a few temperatures. This would certainly increase their error, but this doesn't feel 100% right to me.

Extending to a float with a few decimal places seems reasonable. However, make sure you don't break existing libraries.

~~Is this something GH Actions would catch? I'm using a patched branch for S2 transport, so the tests wont pass locally for me anyways~~

It seems like this does break other libraries, when I tried to run, I saw

 ERROR: Object "308K" does not exist in object /slab_xs/kTs
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1

This feature probably isn't worth it, especially once #2296 is eventually resolved

openmc-dev / openmc

HDF5 dataset name collision in mgxs_library.py preventing more temperature data points than number of temperatures in min-max range #2473