Remove redundant temperature-independent data from HDF5 Datasets

nelsonag commented 8 years ago

Some continuous-energy nuclear data is independent of material temperature; this includes the nuclide metadata and, energy-angle information, and for some evaluations, fast energy cross sections for some reaction channels. To significantly reduce the storage size and better set us up for temperature dependence, it would make sense if the data files for multiple temperatures were 1) combined in to a single file, and 2) redundant information removed.

That is, if the current data format of:

/U235.71c
/U235.72c
/U235.73c
...

be re-written to be:

/U235.endf70
    /300K
    /600K
    /900K
    ...
    /WMP

nelsonag commented 8 years ago

I have an interesting finding to share. So I made a script (the one which yielded PR #702) to compare libraries of the same isotope/metastable state at all the temperatures. I ran tihs so far only on the MCNP6 endf70 data.

I found that for the most part it behaves exactly as you would expect: cross sections and the x/s energy grid are T-dependent and thats mostly it. This holds for all reaction channel x/s, even though BROADR typically doesn't broaden in the fast range, since the ACER thinning algorithm has a chance to change the energy grid. Having said that we could probably force the grid to be the same in the fast range and it'd be OK.

There were two surprises though:

some nuclides (B-10, N-14, N-15, O-16, W-182) have temperature-dependent yields. Looks like its due to the energy grid changing between temperatures. Shouldn't be physical as far as I can tell though.
some nuclides (N-14, O-16, W-182) have UncorrelatedAngleEnergy data associated with (n,not n) reactions (i.e., n,a, n,d, n,p,) and a few (n,2n) and (n,3n) cases wherein the energy grid associated wit hthe angular distributions for some reason goes up to 21 MeV in some temperatures, 20 MeV in others. The angular distributions associated with them (and at the lower energy point in the grid) are the same though. This means they can be made to be on th same grid with 0 impact, but only with care to ensure there is nothing else going on.

nelsonag commented 8 years ago

I wanted to give a little update on this:

I have revised the pyAPI and openmc-ace-to-hdf5.py to remove the duplicate temperature independent data. Looks pretty good. Conversion of ENDF/B-VII.0 (from MCNP), JEFF-3.2, and NNDC ENDF/B-VII.1 data all seem to work great.
- ENDF/B-VII.0 data from MCNP converted to HDF5 per the develop branch as of yesterday requires something like 2.9GB. The conversion per the code I have working now (see nelsonag/openmc/th5) requires around 1.5GB. This dataset only has 4 temperatures, and I should think the savings would grow as more temperatures are present (I forgot to do a du command on the jeff-3.2 data which has loads of temperatures).
I also modified the Fortran code to be able to read in the data.
- The examples/basic data runs just fine, produces the exact same results.
- The code changes basically included replacing default_xs with default_temperature and the xs attribute of a nuclide/element/macroscopic/sab with a temperature attribute on a material (I did the same on the pyapi side too)
I ran the examples/basic problem and it works 'swimmingly' - i.e., the answers don't change. There are a few issues I still have to work out before getting the PR together though. The only one worth discussing here is:
- Only 4 of the 16 or 18 whatever S(a,b) tables in the NNDC dataset have a temperature that matches the rest of the library (293.6K). That means assigning a default temperature or a material-specific temperature means if you use one of these materials (as some of our tests do), the code will fail.
- I need to find out a solution to this. Part of me says the onus is on the users to make sure they have the right temperatures in their model, but that still wont help us pass our tests. Anyways, the options as I see it follow, let me know your thoughts @paulromano, @smharper :
  1. Somehow fake a temperature set of 293.6K in the NNDC data that is lacking it by having the closest temperature be used as a stand-in. This would be in the hdf5 library only, no code changes to OpenMC itself.
  2. Make OpenMC use the closest temperature available in the library by default. I would have to print out a warning or general message about this though.
  3. Allow temperatures to be set per nuclide/macro/element/sab similar to how develop does this now with xs. I dislike the idea as it makes the least sense (they should all be the same!), but it will solve this problem.

Ok, so otherwise, expect this issue to be resolved with the following set of PRs:

PR providing the CE HDF5 library reorganization
- At this point this would only save disk space, not memory, but after this point the users would not have to re-generate libraries
PR providing the MG HDF5 library reorganization
- This would do the same as above, but for the MG library.
- I'd prefer to keep 2 and 1 separate just to minimize the size of each PR, but I may find that the changes needed to make the MG mode work with the new temperature attribute/element instead of xs aren't worth it.
PR reorganizing the CE portion of nuclide_header.F90 and associated code so the data representation within OpenMC also removes redundant temperature independent data.
- Unlike 1, this is where the big benefit is for most of us: it saves memory.
- I want this separate just to keep the size of PR#1 manageable.

A PR analogous to #3 but for MG mode shouldn't be necessary since all the data is temperature-dependent, and therefore there isn't much of a driver to separate it out. If we find a good enough reason, or just to make MG look like CE (even though the MG data isn't in the nuclide class like it is for CE...), expect a future PR.

nelsonag commented 8 years ago

Oh, I should have added what the data format looks like for early digestion:

Within the file U235.h5:

\U235
  \energy   # Every temperature has an independent energy grid due to NJOY thinning
    \294K
    \600K
  \kTs     
    \294K
    \600K
  \reactions
    \reaction_002
      \294K
        \threshold_idx
        \xs
      \600K
        \threshold_idx
        \xs
     \Q_value
     \center_of_mass
     \products
     \...

A few notes:

The energy grids can be made common across all temperatures, at the expense of needing to store more cross sections than necessary (since energies were removed from some temperatures by NJOY's thinning algorithm since they aren't necessary), but that would make temperature interpolation simpler, should it be necessary in the future.
the kTs is something that's only needed since the temperature groups ('294K') are to the nearest integer, that's not good enough to get a correct kT. So I store the temperature-dependent kT value here in units of MeV.
Notice how the reaction structure is kept as is, except within it, there are different xs (and threshold_idx, since the energy grids are different) for each temperature. The products (and thus energy/angle distributions) and other data are temperature independent though

paulromano commented 8 years ago

@nelsonag Sorry I've been mute so far. This all sounds great. Regarding the S(a,b) table issue, I think your option (b) sounds pretty sensible (having it pick the closest temperature and putting out a warning).

Two suggestions:

For storing the kTs, you could just have a single dataset that is an array of kTs.
Store threshold_idx as an attribute on the xs dataset.

nelsonag commented 8 years ago

I agree with option (b).

This kT storage thing is a PITA. I started having a dataset that is an array of kTs, but then you get to the point that I technically have no guarantee that the temperature sets will be added to the h5 library in the order of increasing temperature. With that constraint I need something else that ties the index of this kTs array to the temperature itself, or do a search. At this point I figure I might as well just keep the kTs data the same as all the other data in the library and have it be accessed by the '300K' string.
threshold_idx may already be an attribute actually... good point on threshold_idx.

On Tue, Aug 23, 2016 at 10:42 AM, Paul Romano notifications@github.com wrote:

@nelsonag https://github.com/nelsonag Sorry I've been mute so far. This all sounds great. Regarding the S(a,b) table issue, I think your option (b) sounds pretty sensible (having it pick the closest temperature and putting out a warning).

Two suggestions:

For storing the kTs, you could just have a single dataset that is an array of kTs.

Store threshold_idx as an attribute on the xs dataset.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mit-crpg/openmc/issues/697#issuecomment-241754214, or mute the thread https://github.com/notifications/unsubscribe-auth/AA_TM-F_O6NdQ6EKbdiQZa27tyWYbVpqks5qiwbpgaJpZM4JeU_H .

paulromano commented 8 years ago

Re: kTs -- On second thought, it's easy to add a new group/dataset whereas I'm not sure what happens if you try to modify an existing dataset, so I think you're right on this one.

nelsonag commented 8 years ago

Closed by #712. Yay!

openmc-dev / openmc

Remove redundant temperature-independent data from HDF5 Datasets #697