spencerahill / aospy

Python package for automated analysis and management of gridded climate data
Apache License 2.0
83 stars 12 forks source link

YAML-based specification of aospy objects #333

Open spencerahill opened 4 years ago

spencerahill commented 4 years ago

In starting to play around with intake (motivated by #318), which uses yaml files to specify data catalogs, it occurred to me that we could reduce boilerplate and thus make aospy more user-friendly by supporting aospy objects being defined as yaml files rather than as python modules.

For example, I believe it would be much less intimidating for a new user to be able to create a "my-aospy-lib.yaml" file comprising

my_aospy_proj:
  type: proj
  name: my-cool-proj
  description: My very first aospy project
  direc_out: /path/to/my/files

[...]
precip:
  type: var
  name: precip
  [...]

to create their aospy objects instead of a "my_aospy_lib.py" file comprising

from aospy import Proj, Var

my_aospy_proj = Proj(
    'my-cool-proj', 
    description="My very first aospy project",
    direc_out="/path/to/my/files",
)
[...]

Then they don't have to worry about getting the Python syntax right, dealing with imports, and so on.

The devil may well be in the details, but I think implementing the core of this would actually be quite straightforward, just using pyyaml; then it's yaml.safe_load(open("my_aospy_proj.yaml")) to get a nested dictionary that we could then turn into aospy objects.

I may take a stab at this, although I'd be keen to know if folks see any obvious pitfalls or drawbacks.

spencerahill commented 4 years ago

Also there's the benefit of YAML being a standard specification that therefore can be machine readable by other applications, unlike our aospy python modules.

spencerahill commented 4 years ago

I suppose this applies equally to JSON, and potentially any other popular standard specifications.

spencerkclark commented 4 years ago

I like YAML a lot for its compactness. One tricky thing to sort out in a scheme like this is that a fundamental part of aospy is that these objects often take other user-defined objects as arguments. E.g. Var objects take user-defined functions, and other Var objects as arguments; Proj objects take Model objects; Model objects take Run objects; Run objects take DataLoader objects. Relying on Python imports/scoping for these arguments to specify where they live is a natural (and reasonably well-documented) way of giving the user some flexibility in the way they organize their object libraries. How might we imagine specifying such arguments in a YAML scheme?

spencerahill commented 4 years ago

How might we imagine specifying such arguments in a YAML scheme?

Thanks @spencerkclark. I fully agree --- this occurred to me too, just after the fog of my initial excitement subsided!

I don't have an immediate solution. YAML supports the use of "anchors" see e.g. here, which is basically their way of defining variables and subsequently referencing them. This probably will help with referencing other objects, but its still at the YAML level rather than Python level.

Here is a potential solution for embedding actual python code; note that this comes up even outside the issue of cross-object referencing; start and end dates are often defined as e.g. start_date=datetime.datetime(1980, 1, 1).

I'm not sure if either of these or their combination suffices, but they're probably a good starting place. I'll scan through the yaml specification for more.