m2lines / gz21_ocean_momentum

Stochastic-Deep Learning Parameterization of Ocean Momentum Forcing
MIT License
5 stars 1 forks source link

Catch subdomain configuration errors between training data generation and model training #77

Open raehik opened 1 year ago

raehik commented 1 year ago

The data processing step generates forcings from the CM2.6 dataset for the given spatial domain and time resolution. The training step then works on subdomains of this forcing dataset. These subdomains are configured in the training_subdomains.yaml file (or as of #97 , an arbitrary YAML file with similar syntax). xarray doesn't care if a subdomain isn't fully located in the given forcing domain, it simply continues with as much overlap as present. If this overlap is too small, we may get a runtime error stating that the input size is too small for the neural net kernel (5x5). See #42 , #75 .

Going backwards from that error message to the reason is not obvious. We should catch this sort of misconfiguration and warn the user if they might see such an issue. A couple of options:

raehik commented 11 months ago

85 includes some related code for validating bounding boxes, which I intend to use in the training step too (it'd help for some misconfigurations).