lkilcher / dolfyn

A library for oceanographic doppler instruments such as Acoustic Doppler Profilers (ADPs, ADCPs) and Acoustic Doppler Velocimeters (ADVs).
BSD 3-Clause "New" or "Revised" License
41 stars 25 forks source link

compression=True in dolfyn.save() overwrites encoding values #115

Closed dnowacki-usgs closed 7 months ago

dnowacki-usgs commented 1 year ago

When using dolfyn.save(ds, filename, compression=True) without explicitly passing kwargs specifying the encoding, any pre-existing encoding values on variables will be lost.

https://github.com/lkilcher/dolfyn/blob/7bcf2250d59d0ce979f223e2ec3fab197d49137b/dolfyn/io/api.py#L186-L194

Because encoding can be set on DataArrays prior to serializing to netCDF, any variables that have been encoded this way earlier in a script or pipeline will lose their encoding values, even those unrelated to compression.

I think a solution could be to instead set the compression encoding values on each variable, and then just pass the kwargs on as-is. For example

for ky in ds.variables:
    ds[ky].encoding.update(dict(zlib=True, complevel=1))

Happy to dig deeper and submit a PR.

jmcvey3 commented 1 year ago

Hi @dnowacki-usgs, this was a hack workaround to an xarray or netcdf4 encoding bug, where pre-encoded datasets were causing issues - I forget what exactly. If you have the time to dig deeper and submit a PR, that would be much appreciated.

jmcvey3 commented 11 months ago

Found the original issue: https://github.com/pydata/xarray/discussions/5709