gwmod / nlmod

Python package to build, run and visualize MODFLOW 6 groundwater models in the Netherlands.
https://nlmod.readthedocs.io
MIT License
34 stars 3 forks source link

Not all cache is reused when running a notebook the second time #37

Closed bdestombe closed 2 years ago

bdestombe commented 2 years ago

When I run the Bergen model twice, not all cached data is reused, as can be seen below.


Intersecting oppwater with grid: 100%|██████████| 9/9 [00:02<00:00,  4.17it/s]
Loading gridded surface water data from cache.
INFO:nlmod.mdims.mlayers:get active cells (idomain) from bottom DataArray
INFO:nlmod.mdims.mgrid:get first active modellayer for each cell in idomain
INFO:nlmod.mdims.mlayers:using top and bottom from model layers dataset for modflow model
INFO:nlmod.mdims.mlayers:replace nan values for inactive layers with dummy value
INFO:nlmod.mdims.mlayers:add kh and kv from model layer dataset to modflow model
INFO:nlmod.mdims.mlayers:nan values at the northsea are filled using the bathymetry from jarkus
INFO:nlmod.cache:cache was created using different numpy array values, do not use cached data
INFO:nlmod.cache:cache was created using different dictionaries, do not use cached data
INFO:nlmod.mfpackages.mfpackages:creating modflow SIM, TDIS, GWF and IMS
INFO:nlmod.cache:caching data -> sea_model_ds.nc
INFO:nlmod.cache:cache was created using different numpy array values, do not use cached data
INFO:nlmod.cache:cache was created using different dictionaries, do not use cached data
INFO:nlmod.cache:caching data -> bathymetry_model_ds.nc
INFO:nlmod.mdims.mgrid:get first active modellayer for each cell in idomain
INFO:nlmod.cache:cache was created using different numpy array values, do not use cached data
INFO:nlmod.cache:cache was created using different dictionaries, do not use cached data
INFO:nlmod.mfpackages.mfpackages:creating modflow SIM, TDIS, GWF and IMS
INFO:nlmod.cache:caching data -> surface_water.nc```

Could this be that there are NaN values in the array and a different way of comparing two arrays should be used?
Is an optimization used to estimate certain values, so thtat there is always a small random error? Could we use np.isclose here?

Can't we easily make a test of this? Run a notebook twice and make sure nothing is downloaded the second time from the internet..

Best regards,
Bas
OnnoEbbens commented 2 years ago

In this case the issue was some very small difference in the botm numpy array (dtype np.float64) from the gridprops dictionary. On my computer the cache was created using the value -13.60151824536105 while the new value was -13.601518245360648. I think this comes from preprocessing of the botm array where nan values are filled using interpolation.

Using np.allclose to compare the numpy arrays solves this issue.

Creating tests for this is a good idea although maybe not so easy. I will leave this issue open so we can decide later if we want to do this.

bdestombe commented 2 years ago

@OnnoEbbens I think you fixed this, right?

And writing the tests for the cache functions still remains?

OnnoEbbens commented 2 years ago

And writing the tests for the cache functions still remains?

Correct!

OnnoEbbens commented 2 years ago

Some tests were added for caching.