Open BijalBPatel opened 11 months ago
Already exists in some form - look at PyHyperScattering.util.FileIO
and the methods therein saveNexus
savePickle
loadNexus
loadPickle
A function in FileIO that sanitized the attributes to allow NetCDF serialization would be very useful, as would documentation improvements around the existing save/load functionality.
I have a messy stub for netCDF, i can take this on during the hackathon. Pardon the formatting below:
import json
import copy
def saveScan(int_scans: xr.DataArray, outPath: str):
"""Saves an xr.DataArray containing scattering data to a netCDF file
Converts datetime attributes to strings (one-way conversion) and uses JSON.dumps()
to convert nested dicts to str (reverses on load with loadIntegratedScan)
Parameters
----------
int_scans : xr.DataArray
xarray DataArray containing scattering data
outPath : str
target output path (containing filename and extension)
"""
# Create output variable
int_scans_out = copy.deepcopy(int_scans)
# Convert problematic (non serializable) keys
keys = list(int_scans_out.attrs.keys())
for attr in keys:
# Convert datetime to str
if isinstance(int_scans_out.attrs[attr], datetime.datetime):
int_scans_out.attrs[attr] = str(int_scans_out.attrs[attr])
# Serialize dicts
if isinstance(int_scans_out.attrs[attr], dict):
# Identify as JSON'd by changing name
newKey = "json_" + attr
# Todo handle errors on unserializable key/values, for now just tries to convert to str
int_scans_out.attrs[newKey] = json.dumps(int_scans_out.attrs[attr], default=str)
del int_scans_out.attrs[attr]
# Save integrated data
int_scans_out.to_netcdf(outPath)
def loadScan(inPath: str):
"""Loads an xr.DataArray from netcdf generated by saveIntegratedScan()
Attempts to revert JSON'd nested dict vars. Note probably doesn't preserve data types.
Parameters
----------
inPath : str
target output path (containing filename and extension)
Returns
-------
xr.DataArray containing scattering data
"""
# Load from file
scans_in = xr.load_dataarray(inPath)
# Revert JSON'd vars
keys = list(scans_in.attrs.keys())
for attr in keys:
# Identify JSON'd keys
if "json_" in str(attr):
scans_in.attrs[attr[5:]] = json.loads(scans_in.attrs[attr])
del scans_in.attrs[attr]
# return loaded data
return scans_in
Looks good!
Worth considering orjson (https://github.com/ijl/orjson) which correctly serializes numpy types.
Occasionally I find it useful to export the scattering dataset after integration, but the builtin xarray.to_netcdf() gives some clunky errors on datetime.datetime() attributes and attributes with nested dicts.
Would it be useful to build in export/load functions? Where should it go?