observingClouds / xbitinfo

Python wrapper of BitInformation.jl to easily compress xarray datasets based on their information content
https://xbitinfo.readthedocs.io
MIT License
52 stars 21 forks source link

ICON examples #36

Closed aaronspring closed 2 years ago

aaronspring commented 2 years ago
aaronspring commented 2 years ago

maybe use lower compression levels: complevel=4 seems sufficiently good already. actually even complevel=1 does the largest part already.

3.2G    ICONO_R2B8_bitrounded_compressed_l1.nc
3.1G    ICONO_R2B8_bitrounded_compressed_l2.nc
2.8G    ICONO_R2B8_bitrounded_compressed_l4.nc
2.7G    ICONO_R2B8_bitrounded_compressed_l7.nc
13G ICONO_R2B8_compressed.nc
13G ICONO_R2B8_original.nc
aaronspring commented 2 years ago

I have less problems when getting interactive or compute session in the command line and running a script compared to jupyter.

import bitinformation_pipeline as bp
import xarray as xr
import json
import os

l=7
label = 'ICONO_R2B8'
path = "/work/mh0727/m300524/test_output/exp.ocean_era51h_r2b8_hel20218-ERA_20000401T000000Z.nc"

ds = xr.open_dataset(path,chunks={"depth":8, "depth_2":8})

ds = ds.set_coords(["clon_bnds","clat_bnds","elon_bnds","elat_bnds"])
dsa = ds.transpose("ncells",...).transpose("ncells_2",...)

info_per_bit = bp.get_bitinformation(dsa, axis=0, masked_value="convert(Float32,0)", label=label)

keepbits = bp.get_keepbits( info_per_bit, 0.99)
# correction for xr_bitround
keepbits = {k:max(0,v) for k,v in keepbits.items()}
print("keepbits",keepbits)

ds_bitrounded = bp.xr_bitround(ds, keepbits)

print("bitrounding done")

ds_bitrounded = ds_bitrounded.reset_coords(["clon_bnds","clat_bnds","elon_bnds","elat_bnds"])

print("start to_compressed_netcdf")

ds_bitrounded.to_compressed_netcdf(f"{label}_bitrounded_compressed_l{l}.nc", complevel=l)
ds_bitrounded.to_compressed_netcdf(f"{label}_bitrounded_compressed_l{l}_for_cdo.nc", for_cdo=True, complevel=l)
aaronspring commented 2 years ago

'velocity_windMixing': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 'tracer_windMixing': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 'K_tracer_h_to': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],

should be visualized differently in plot_bitinformation(): shouldnt keep 23 but rather -8 as returned by keepbits

image
aaronspring commented 2 years ago

full analysis: https://gist.github.com/aaronspring/88b6027264d1f2e5137bfcd113c34f75 yields compression factor 5