rigoudyg / climaf

CliMAF - a Climate Model Analysis Framework - doc at : http://climaf.readthedocs.org/
Other
16 stars 7 forks source link

Automatically apply .check method to avoid getting a dataset that doesn't cover the requested period #259

Open jservonnat opened 1 month ago

jservonnat commented 1 month ago

Up to now (if I'm correct), ds() returns a dataset even if the baseFiles() do not cover the requested period. In my case it has been the source of some errors, and I'm sure it impacted other users.

Two options to deal with this behavior:

Both methods return True if the period is covered by the files, and False if not the case.

Example:

req_dict = dict(project='CMIP6',
                variable='rsds',
                model = 'AWI-CM-1-1-MR',
                experiment='ssp245',
                table='day',
                realization='r1i1p1f1'
               )

# -- Period covers available files
ok_ds = ds(period='2021-2040', **req_dict)

# -- Period start earlier than available files
not_ok_ds = ds(period='2001-2040', **req_dict)

print('Built in method: .check()')
print('OK ds ==> ',ok_ds, ok_ds.explore('resolve').check())
print('Not OK ds ==> ',not_ok_ds, not_ok_ds.explore('resolve').check())

print('External function: check_time_consistency_CMIP')
print('OK ds ==> ',ok_ds, check_time_consistency_CMIP(ok_ds.explore('resolve')))
print('Not OK ds ==> ',not_ok_ds, check_time_consistency_CMIP(not_ok_ds.explore('resolve')))
Built in method: .check()
OK ds ==>  ds('CMIP6%%rsds%2021-2040%global%/bdd%AWI-CM-1-1-MR%*%*%day%ssp245%r1i1p1f1%%g*%latest') True
Not OK ds ==>  ds('CMIP6%%rsds%2001-2040%global%/bdd%AWI-CM-1-1-MR%*%*%day%ssp245%r1i1p1f1%%g*%latest') False
External function: check_time_consistency_CMIP
OK ds ==>  ds('CMIP6%%rsds%2021-2040%global%/bdd%AWI-CM-1-1-MR%*%*%day%ssp245%r1i1p1f1%%g*%latest') True
Not OK ds ==>  ds('CMIP6%%rsds%2001-2040%global%/bdd%AWI-CM-1-1-MR%*%*%day%ssp245%r1i1p1f1%%g*%latest') False
​

Should we switch the default behavior of ds to returning an error (or at least a warning) when the period is not covered by the files?

senesis commented 1 month ago

Changing a default behaviour always is risky for users, and would imply warning them by changing the major version number (because of backward compatibility break)

Further, function ds was intended to be a light-weight one, without systematic access to datafiles; it is also the support for methods 'explore', 'light_check', 'glob', which allow to report on existing and missing datafiles

A way forward could be to introduce a global logical variable that would trigger the requested behaviour, but letting each user set it for activating this behaviour