Compute metrics for new interpolation model

I am attempting to evaluate my own model on the challenge dataset. I am using this notebook as inspiration and can generate a predictions.nc file with my model, however, I am unable to to get the actual evaluation scores, because of missing configs and I am quiet confused where they are coming from:

cfg.results.outputs.eval.methods

this seems to hold available methods of evaluation, but I don't know how it should be configured in all those nested config files so that I can evaluate my method. Currently only available keys there are ["4dvarnet", "musti"].

So I guess my question is what do I need to change here to compute metrics for my method? Cheers.

Hi @nilsleh,

Thanks for the interest and sorry for the delayed response. Indeed managing the nested config is not trivial. What you can do is add a config to your method newmethod results with:

OmegaConf.set_struct(cfg.results.outputs.eval.methods, False) # allow adding fields
cfg.results.outputs.eval.methods.newmethod = dict(_target_='newmethod_loader', _partial_=True) # new config points to a newmethod_loader function

Thenewmethod_loader needs to be a function that return a dataarray, with lat, lon, time dimensions (with units) on a daily and 1/20° grid with the correct date and coordonates, etc... You can reuse the steps and functions specified in the 4dvarnet method config ( which you can see with print(OmegaConf.to_yaml(cfg.results.outputs.eval.methods['4dvarnet'])):

The steps are (more or less:)

xarray.open_dataset
ocn_tools._src.geoprocessing.validation.decode_cf_time
ocn_tools._src.geoprocessing.validation.validate_latlon
ocn_tools._src.geoprocessing.validation.validate_time
ocn_tools._src.geoprocessing.validation.validate_ssh
xarray.DataArray.sel
xarray.DataArray.resample
xarray.core.resample.DataArrayResample.mean
ocn_tools._src.geoprocessing.gridding.grid_to_regular_grid

Then you run the actual metrics:

cfg.method='newmethod'
cfg.outputs.summary.method='newmethod'
lb = hydra.utils.call(cfg.outputs)
eval_ds = lb.build_eval_ds()
score = pd.Series({k: v(eval_ds) if callable(v) else v for k,v in lb.summary.items()})

NB: What is done in the notebook is to reuse the loader from the 4dvarnet by just switching up the path from where the data is read: you can also do this if you ensure your data has the same variable names as the 4dvarnet.nc file

I hope this gives you some first elements, I'll try to be more reactive on this trhread from now on

Best, Quentin

Hi @quentinf00,

Thank you for your reply. I am still running into several errors, trying to juggle the different configuration scripts, since there are several places with hard coded variables, that do not match a new method. Furthermore, I am encountering other errors like ValueError: ESMC_FieldRegridStore failed with rc = 506. Please check the log files (named "*ESMF_LogFile"). whose meaning I fail to understand from the spot.

However, I am at the point, where generate a predictions.nc file based on the patcher.reconstruct() method, in the notebook it is called rec_ds. Is there a way to just compute the metrics based on that predictions.nc file? These nested configs make it quiet difficult to understand what is actually happening and where the computations of metrics is happening. I guess somewhere this additional package is called?

Intuitively I would expect that I could call some compute_metrics(targets, predictions) , where targets is the xarray with targets before it went to the patcher that creates the patches for the model. I could try to write a function like that myself based on those metrics scripts linked above, however, I'd not be sure that I am forgetting something that is happending in all that hydra config logic. Cheers.

My current attempt is the following:

Generate predictions:

predictions = trainer.predict(module, test_loader, ckpt_path=ckpt_path)
m, s = datamodule.mean, datamodule.std
pred_ds = test_loader.dataset.reconstruct_from_batches(predictions, dims_labels=('time', 'lat', 'lon')).pipe(lambda da: (da * s) + m ).to_dataset(name='ssh')
pred_ds['obs'] = xr.DataArray(test_loader.dataset.patcher.da.variable.data[0], dims=('time', 'lat', 'lon'), coords={'time': pred_ds.time, 'lat': datamodule.test_dataset.patcher.da.lat, 'lon': datamodule.test_dataset.patcher.da.lon})

pred_ds = grid_to_regular_grid(pred_ds, test_loader.dataset.patcher.da)

And then compute the metrics with a config files I found in the repo:

psd_isotropic_score:
  _target_: "ocn_tools._src.metrics.power_spectrum.psd_isotropic_score"
  _partial_: True
  psd_dims: ["lon", "lat"]
  avg_dims: ["time"]
  detrend: "constant"
  window: "tukey"
  nfactor: 2
  window_correction: True
  true_amplitude: True
  truncate: True

psd_spacetime_score:
  _target_: "ocn_tools._src.metrics.power_spectrum.psd_spacetime_score"
  _partial_: True
  psd_dims: ["time", "lon"]
  avg_dims: ["lat"]
  detrend: "constant"
  window: "tukey"
  nfactor: 2
  window_correction: True
  true_amplitude: True
  truncate: True

nrmse:
  _target_: "ocn_tools._src.metrics.stats.nrmse_ds"
  _partial_: True
  target: "obs"
  reference: "ssh"
  dim: ["lat", "lon", "time"]

And evaluate:

metric_config = OmegaConf.load(metric_config)
psd_isotropic = instantiate(metric_config.psd_isotropic_score, ds=pred_ds, ref_variable="obs", study_variable="ssh")()
psd_spacetime_score = instantiate(metric_config.psd_spacetime_score, ds=pred_ds, ref_variable="obs", study_variable="ssh")()
nrmse = instantiate(metric_config.nrmse, ds=pred_ds)()

However, the regridding fails, with the above error and without regridding the fourier transform does not work.

Hi @nilsleh ,

First a few questions:

On what benchmark are you trying to evaluate your method ose (with real observation) or osse (with pseudo-observations) ?
What do you mean by 'several places with hard coded variables, that do not match a new method.' ?

First If you have a predictions.nc with an 'ssh' variable, you should be able to run the following cell and get the metrics:

with hydra.initialize('oceanbench/config', version_base='1.3'):
    cfg = hydra.compose('main', overrides=[
        'task=[osse_gf_nadir/task,osse_gf_nadir/results,osse_gf_nadir/leaderboard]',
            'task.data.base_dir=/<path-to>/oceanbench-data-registry/osse_natl60/grid', # We need to update the path
            'results.base_dir=/<path-to>/oceanbench-data-registry/results/osse_gf_nadir', # We need to update the path
        ])
cfg.method='4dvarnet'
cfg.outputs.summary.method='4dvarnet'
cfg.results.outputs.eval.methods['4dvarnet'].inp = 'predictions.nc'
lb = hydra.utils.call(cfg.outputs)
pd.Series({k: v(eval_ds) if callable(v) else v for k,v in lb.summary.items()})

About your error: ValueError: ESMC_FieldRegridStore failed with rc = 506. Please check the log files (named "*ESMF_LogFile"), it's linked to the ESMF package, you may want to check that your input xarray has units on lat and lon variable, and that the env variables ESMFMKFILE and PROJ_DATA are well set on your environment. Otherwise I would suggest reinstalling this package

I completely agree that the current config management is less than optimal for what you're trying to do, the original idea was to allow for contributor to define the config of where to fetch and format the output instead of directly asking for a formatted output. Another aspect of the philosophy behind the config management, is that the different tasks impose the reference data in order to make sure that the targets are not processed differently between evaluations. But it ended becoming quite difficult to find its way around the configuration and to reuse the configuration to compute metrics outside the predefined cases: so I think you're idea of rewriting a compute_metrics method is good, and I'm happy to assist you in the process.

The three function you showed are exactly the ones used, so indeed the main obstacle seems to be the debugging the regridding to the same 1/20° grid (see above the hints), after you can lookup in the tasks config what period and domain to use for the test. And I don't know if this is clear but

I'd be happy, to plan a video call if you want me to take a closer look at your issue.

Hope this helps, let me know !

Hi @quentinf00,

Thank you for your reply, debugging the ESMF error, it appears that the reconstruct_from_batches() method that generates the predictions xarray creates one with degenerated elements:

20240916 181436.228 ERROR            PET0 ~~~~~~~~~~~~~~~~~~~~ Degenerate Element Detected ~~~~~~~~~~~~~~~~~~~~
20240916 181436.228 ERROR            PET0   degenerate elem. id=1
20240916 181436.228 ERROR            PET0 
20240916 181436.228 ERROR            PET0   degenerate elem. coords (lon [-180 to 180], lat [-90 to 90]) (x,y,z)
20240916 181436.228 ERROR            PET0   -----------------------------------------------------------------
20240916 181436.228 ERROR            PET0     0  (-64.950000,  33.000000)  (0.355101, -0.759784, 0.544639)
20240916 181436.228 ERROR            PET0     1  (-64.900000,  33.000000)  (0.355764, -0.759474, 0.544639)
20240916 181436.228 ERROR            PET0     2  (-64.900000,  33.000000)  (0.355764, -0.759474, 0.544639)
20240916 181436.228 ERROR            PET0     3  (-64.950000,  33.000000)  (0.355101, -0.759784, 0.544639)

at the top of my head I am not sure how that would happen and have to debug further.

This is what I am running at the moment:

with hydra.initialize('../../oceanbench/config', version_base='1.3'):
    cfg = hydra.compose('main', overrides=[
        'task=[osse_gf_nadir/task,osse_gf_nadir/results,osse_gf_nadir/leaderboard]',
            'task.data.domain=gf', # We downloaded the reduced dataset,
            'task.data.base_dir=../../oceanbench-data-registry/osse_natl60/grid', # We need to update the path
            'results.base_dir=./', # We need to update the path
        ])

predictions = trainer.predict(module, test_loader, ckpt_path=ckpt_path)
m, s = datamodule.mean, datamodule.std
pred_ds = test_loader.dataset.reconstruct_from_batches(predictions, dims_labels=('time', 'lat', 'lon')).pipe(lambda da: (da * s) + m ).to_dataset(name='ssh')

pred_ds = validate_latlon(pred_ds)
pred_ds = validate_time(pred_ds)
pred_ds = validate_ssh(pred_ds)

pred_ds.to_netcdf("predictions.nc")

cfg.method='4dvarnet'
cfg.outputs.summary.method='4dvarnet'
cfg.results.outputs.eval.methods['4dvarnet'].inp = 'predictions.nc'
lb = hydra.utils.call(cfg.outputs)
eval_ds = lb.build_eval_ds() 
score = pd.Series({k: v(eval_ds) if callable(v) else v for k,v in lb.summary.items()})
print(score.to_frame(name='Summary').T.to_markdown())

For the implementation I am using the docker image that I contributed to the repo, but potentially it is easier to have a call about this if you are available? My email is n.lehmann@tum.de.

jejjohnson / oceanbench

Compute metrics for new interpolation model #70