rcaneill / xnemogcm

Interface to open NEMO global circulation model output dataset with xarray and create a xgcm grid.
https://xnemogcm.readthedocs.io/
MIT License
21 stars 9 forks source link

Control over datatype in domain_cfg, mesh_mask and nemo files #112

Open vopikamm opened 4 months ago

vopikamm commented 4 months ago

I noticed that NEMO output files and the domain_cfg files are of different dtype. When e.g. using thickness-weighted temperatures this changes the dtype from float32 to float64 doubling the arrays size in memory:

from xnemogcm import open_domain_cfg, open_nemo

domain = open_domain_cfg(
            files=['/path/to/domain_cfg/files']
        )

data   = open_nemo(domcfg=domain, files=['/path/to/nemo/files'])

# size of array doubles after multiplication with e3t
(domain.e3t_0 * data.toce).nbytes / (data.toce).nbytes # --> 2

#type of data
(domain.e3t_0 * data.toce).dtype     # --> float64
(data.toce).dtype                    # --> float32

Have you already considered this?

rcaneill commented 4 months ago

I noticed that NEMO output files and the domain_cfg files are of different dtype

I think that the precision of the model output (e.g. toce) can be chosen to be float32 or 64. For domncfg / meshmask, I think that it is the user who should decide whether they can safely decrease the quality to float32, depending on what is calculated.

rcaneill commented 4 months ago

I close this issue, please re-open it if you think that it is necessary. If you believe that xnemogcm should handle this, I am open to any suggestion :).

vopikamm commented 4 months ago

Thanks! No I don't think xnemogcm should handle it, I agree that it's more a user choice/nemo issue. But I wonder what is the motivation of writing the domain_cfg, mesh_mask at higher precision than the model output from a nemo perspective... Do you know how xgcm handles this? Because this can really become a bottleneck for very large datasets

rcaneill commented 4 months ago

But I wonder what is the motivation of writing the domain_cfg, mesh_mask at higher precision than the model output from a nemo perspective...

Good question, I'm sure that someone at LOCEAN should know this :)

I don't think that xgcm handles it in a particular manner. It simply relies on the way that xarray/numpy works.

You could compute your diags with float32 for the domcfg and compare with the same diags with float64 to see if you can simply convert the scale factors into float32 at the beginning of your analysis pipeline (this case could be handled by xnemogcm so I reopen the issue)