pangeo-data / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
183 stars 32 forks source link

Fix for allowing a weight file to be written when it does not exist when using filename argument. #234

Open jr3cermak opened 1 year ago

jr3cermak commented 1 year ago

This might address issues #153 and #202. I had been using xESMF version 0.3.0 and now improving code to use the latest 0.7.0. In version 0.3.0, with Regridding(reuse_weights=True and filename="...") and the filename does not exist, the weights file is created on the first pass and reused in subsequent uses. In version 0.7.0, we encounter this error when the weights file does not exist. We do not pre-generate a weights file,

File "/home/cermak/src/xESMF/xesmf/smm.py", line 74, in _parse_coords_and_values
    raise IOError(f'Weights file not found on disk.\n{indata}')
OSError: Weights file not found on disk.

This patch restores the previous behavior from 0.3.0.

The logic is just a little complex.

If reuse_weights is False, compute new weights each time unless the weights argument is specified. The weights argument, if a filename, MUST exist.

If reuse_weights is True, use the weights argument if provided. Override weights argument, if filename is provided. If the weights argument is a filename, it MUST exist. If a filename argument is provided, override the weights argument. If the provided filename exists, use it for the weights otherwise compute new weights and trigger saving the weights to the filename in the first pass. Subsequent calls will utilize the new and existing filename.

Maybe add some wording to the function arguments to further clarify use.

jr3cermak commented 1 year ago

If this PR is approved, the CI test would have to be changed.

The logic would permit that weight file to be created in this case where reuse_weights=True, filename='fakewgts.nc'. It would be an error if 'fakewgts.nc' was not created. If the arguments were reuse_weights=True, weights='fakewgts.nc', that should generate an OSError.

        # check fails on non-existent file
        with pytest.raises(OSError):
>           regridder_reuse = xe.Regridder(
                ds_in, ds_out, method, reuse_weights=True, filename='fakewgts.nc'
            )
E           Failed: DID NOT RAISE <class 'OSError'>
raphaeldussin commented 1 year ago

@jr3cermak IMHO setting reuse_weights=True to compute and save weights is confusing. The current behavior allows to save weights when reuse_weights=False and filename="something.nc" and then one can re-load the weights using reuse_weights=True and weights="something.nc"

jr3cermak commented 1 year ago

ok. Is see how the reuse_weights option is worded now. We have arrived at a solution that seems to satisfy both 0.3.0 and 0.7.0. For those that need it:

grd = xesmf.Regridder(...., 
  reuse_weights=os.path.isfile(desired_weight_file),
  filename=desired_weight_file
)

instead of needing to do something like:

if not(os.path.isfile("something.nc")):
  grd = xesmf.Regridder(reuse_weights=False, filename="something.nc")
else:
  grd = xesmf.Regridder(reuse_weights=True, weights="something.nc")