unifhy-org / unifhy

A Unified Framework for Hydrology
https://unifhy-org.github.io/unifhy
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

filename not saved in YAML configuration file #80

Closed ThibHlln closed 2 years ago

ThibHlln commented 2 years ago

There seems to be a bug in unifhy when the files contained in the unifhy.DataSet are small enough so that they can fit in memory. This seems to be linked to the documented behaviour of cf.Field.get_filenames:

The file names in normalised, absolute form. If all of the data are in memory then an empty set is returned.

This results in the filenames attribute of a given unifhy.Variable to be an empty set. Ultimately leading to saving an empty sequence of filenames in the YAML file, so that a to_yaml > from_yaml workflow fails.

It would be good to check with cf-python whether there is another functionality that keeps track of filenames, or if it makes sense for their package to offer such functionality. If not, it will be up to unifhy to keep track of them.

ThibHlln commented 2 years ago

As per David's suggestion (https://github.com/NCAS-CMS/cf-python/issues/365), it makes more sense for unifhy to manually store the filenames right after the call to cf.read.

And it turned out, it was already the case: https://github.com/unifhy-org/unifhy/blob/c4e235cf923778d8ea78f9155f8a9d6b03bf1414/unifhy/data.py#L172-L178

But then, Component is manipulating the fields contained in DataSet in such a way that cf will drop the filenames along the way. In a couple of places later in the workflow, filenames are retrieved from the field directly (i.e. using Field.get_filenames() method) rather than from the variable (i.e. using Variable.filenames attribute): https://github.com/unifhy-org/unifhy/blob/c4e235cf923778d8ea78f9155f8a9d6b03bf1414/unifhy/component.py#L714 https://github.com/unifhy-org/unifhy/blob/c4e235cf923778d8ea78f9155f8a9d6b03bf1414/unifhy/component.py#L744

This needs to be fixed by using the Variable attribute instead of the Field method.