spencerahill / aospy

Python package for automated analysis and management of gridded climate data
Apache License 2.0
84 stars 13 forks source link

Towards encoding all possible metadata into output of aospy calculations #203

Open spencerahill opened 7 years ago

spencerahill commented 7 years ago

(Jotting down some quick notes on this idea for now, not necessarily well thought out)

inspect is an amazing piece of the standard library that enables all sorts of cool introspection into objects. I think we could use it to encode even more metadata into the results of aospy computations (for now namely results of aospy.Calc). Akin to how the nco operators append a history attribute to their output showing the command that generated that file.

To give a quick sense of what's possible:

In [72]: import inspect

In [73]: import aospy_user as au

In [74]: var = au.variables.mse

In [76]: inspect.getdoc(var.func)
Out[76]: 'Moist static energy, in Joules per kilogram.'

In [77]: inspect.getfile(var.func)
Out[77]: '/Users/shill/Dropbox/py/aospy_user/aospy_user/calcs/thermo.py'

In [78]: inspect.getmodule(var.func)
Out[78]: <module 'aospy_user.calcs.thermo' from '/Users/shill/Dropbox/py/aospy_user/aospy_user/calcs/thermo.py'>

In [79]: inspect.getsource(var.func)
Out[79]: 'def mse(temp, hght, sphum):\n    """Moist static energy, in Joules per kilogram."""\n    return dse(temp, hght) + L_v.value*sphum\n'

In [80]: import pprint

In [81]: pprint.pprint(inspect.getsource(var.func))
('def mse(temp, hght, sphum):\n'
 '    """Moist static energy, in Joules per kilogram."""\n'
 '    return dse(temp, hght) + L_v.value*sphum\n')

The end goal is encoding as much metadata as possible about a calculation -- what the input data was, what the aospy processing pipeline was -- so that e.g. given a netCDF file of aospy output by a collaborator, you have all the information there is to have about how it was created. This would enable you to e.g. check for bugs in the function they wrote.

But this above mse example highlights that there would be kinks to work through. Notice the function call dse within mse; what's really needed is a full stack trace of all embedded functions. That might have to wait until a proper treatment of #3. And then how to embed all that into netCDF attributes may also be tricky. Perhaps we could make use of Dataset coordinates?

There are probably intermediate steps we could implement.

chuaxr commented 7 years ago

This is related to the subject of this thread, but not the inspect library: it would be nice if the description attribute from the definition of Var objects were available in the output file and/or functions were able to add descriptive attributes.

While admittedly not as systematic as what's described above, this would at least allow users to flag some important assumptions and realize how they may/may not apply to the output file they generated some time ago.

spencerahill commented 7 years ago

@chuaxr I totally agree. And that will be easy to do -- easier than the above stuff that requires inspect.