Various improvements + documentation

paulromano commented 7 years ago

This pull request makes a number of improvements to OpenDeplete:

Structure the source as a proper Python package (with a setup.py script that allows it to be installed).
Adds a stub for documentation using Sphinx. If you have sphinx installed, go to the docs directory and run make html.
Many of the n_* attributes on classes were removed and replaced with properties that automatically calculate them as requested. This ensures they are always consistent with the other instance variables.
Made judicious use of the enumerate function.
Fixes a bug in merge_results where the specified weights are not taken into account.

cjosey commented 7 years ago

I've started testing this. I have found a few issues. The first one is due to OpenMC. In https://github.com/mit-crpg/openmc/commit/e9479aeb0db295a0be6c222d412d64e8f6101ec0 , a check for a set density is performed. If just mat.set_density(units='sum') is called, mat._density == None. After modifying openmc/material.py::765 to: if self._density is not None or self._density_units is 'sum':, I was successful in getting the code to run again.

The second issue is that the loading of results no longer works. Pickle complains that it cannot find module results due to the new layout. It's still openable with the older version of the code, so I was able to do comparisons of results.

Gd157 concentrations are bitwise identical up until timestep 4 in the example script. Timestep 5 diverges by a few bits. After that, statistical cascading yields an entirely different result. I would chalk that up to a slightly order of equations yielding a different answer. I'd call that a pass.

PyLint has a lot to complain about. The overall opendeplete module as well as many of the @property objects are missing docstrings. I'm not sure if we need @property docstrings, opinions? It also complains about the use of y, dt, etc. variables, which is a bit excessive. I should write a settings file to ignore such variables. Finally, it makes a nifty dependency graph:

    numpy (opendeplete.utilities,opendeplete.depletion_chain,opendeplete.openmc_wrapper,opendeplete.concentrations,opendeplete.reaction_rates,opendeplete.integrator)
    opendeplete 
      \-concentrations (opendeplete.results)
      \-depletion_chain (opendeplete.openmc_wrapper,opendeplete.function)
      \-nuclide (opendeplete.depletion_chain)
      \-openmc_wrapper (opendeplete.function)
      \-reaction_rates (opendeplete.openmc_wrapper,opendeplete.results)
      \-results (opendeplete.utilities,opendeplete.integrator)
    openmc (opendeplete.openmc_wrapper)
      \-stats (opendeplete.openmc_wrapper)
    scipy (opendeplete.utilities)
      \-sparse (opendeplete.depletion_chain,opendeplete.integrator)
      | \-linalg (opendeplete.integrator)
      \-stats (opendeplete.utilities)

Overall, the PR looks good. I'm going to go line-by-line and see if there's anything that sticks out, but I haven't yet seen anything in my cursory scans.

paulromano commented 7 years ago

Re: the pickle issue, unfortunately there is no way around that. It is sensitive to how objects are imported and you change that, you might break the ability to read data that was written previously as has happened here. You/we may want to consider an alternative solution to serializing data that is a little more robust. Happy to help brainstorm if you are interested.

Regarding property docstrings, my own take is that as long as a description is listed in the class docstring, you are good. For OpenMC documentation (and the stubs I've added here), if you use numpydoc-style docstrings, they'll show up in the autogenerated HTML.

cjosey commented 7 years ago

I did some research on a few possibilities:

Using dill instead of pickle. Pros: Nearly no change in the code, just grep pickle and replace with dill. Cons: Same general limitations as pickle.
Use HDF5. Pros: much more robust, supported everywhere else in the CRPG toolchain. Cons: dealing with dictionaries and the like requires JSON serialization. In addition, this limits output to gzip or bzip2 compression unless alternate filters are installed.

I've implemented and tested the first in my improvements branch. If you think it's suitable, feel free to merge it into yours and I'll merge this PR. If not, we can keep working on a better scheme.

paulromano commented 7 years ago

I thought it might be good to discuss other options for posterity:

JSON, Pros: human-readable, simple. Cons: Doesn't handle numeric data well. Can't handle arbitrary objects without defining new encoders/decoders. I gather you have implicitly rejected this option (which I agree with).
hickle, which is a HDF5-based pickle replacement.
cloudpickle, which sounds kind of like dill.

I think on balance I prefer HDF5 versus the other options. I like pickle/dill for its simplicity, but it is Python-specific. It also doesn't necessarily work across different versions of Python. I view HDF5 as a better interchange format that handles binary data well. If you'd like, I can take a shot at to_hdf5() and from_hdf5() methods that would allow us to load/store instances of the Results class as part of this PR.

cjosey commented 7 years ago

I do agree that HDF5 would be preferable. I was thinking in the narrow box of "if someone is using this program, they'd probably prefer to use Python to work with it".

Do you think there is a suitable alternative to dictionaries that can be converted to HDF5 without some complicated read-in routine on the non-Python side?

paulromano commented 7 years ago

Well, HDF5 does support compound datatypes (essentially structs), but any arrays within them would have to be statically sized. In my opinion, they are more complicated than just having read/write routines handle multiple datasets.

Other than compound datatypes, the only other way I could think of handling a dictionary is to do something like what hickle does. If you're thinking about writing a dictionary to HDF5 in the context of a class being essentially a dictionary (whose attributes are (key, value) pairs), one possibility is to create a mixin that adds to_hdf5() and from_hdf5() methods that just iterate over the attributes of the class and write each one to a separate group/dataset. Then, any class that inherits the mixin could call these methods recursively. This is a somewhat complex solution though and in the case of opendeplete, I don't personally think the classes are complex enough to warrant this, i.e., we're probably better off just writing read/write methods ourselves.

cjosey commented 7 years ago

I thought about this more. The dictionaries have rather simple functionality. It maps a string to an index in an array. We guarantee that all indices are unique, and that all indices are sequential. As such, the dictionary can be stored just as an array of strings, with their position in the array corresponding to the index.

I just derived a really cool capability (substep interpolation) which can be extended to the "classic" integrators in the current repo. However, it requires completely changing how results are stored. I'll implement it on top of this PR, add HDF5 support, then PR it and accept both at the same time.

mit-crpg / opendeplete

Various improvements + documentation #7