Open lewisacidic opened 7 years ago
I would be OK with this if it required explicitly setting a keyword argument, e.g., ds.to_netcdf(..., allow_pickle=True)
and xarray.open_dataset(..., allow_pickle=True)
. This could be hooked into xarray's existing coding/decoding layer in a relatively straightforward fashion: see ensure_dtype_not_object
for where this is caught in the current code. (We would also need something at a lower level in the netCDF4 specific reader/writer to handle uint8
VLType.)
I would certainly be interested in giving this a try, although I'm not exactly sure what would go where yet. It seems like this might possibly be something that would be more appropriate in the netCDF4-python library - should I start an issue over there?
Sure, there's no harm in asking. My guess is that this isn't a good fit, but I'm not entirely sure.
Yeah, looking at it, it's probably not a thing for them. I thought something like:
# implement something like
# strs = nc.createVariable('strs', str, ('strs_dim',))
objs = nc.createVariable('objs', object, ('objs_dim',))
But I see that the str
datatype is a netCDF spec type.
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here or remove the stale
label; otherwise it will be marked as closed automatically
I am looking to transition from pandas to xarray, and the only feature that I am really missing is the ability to seamlessly save arrays of python objects to hdf5 (or netCDF). This might be an issue for the backend netCDF4 libraries instead, but I thought I would post it here first to see what the opinions were about this functionality.
For context, Pandas allows this by using pytables'
ObjectAtom
to serialize the object using pickle, then saves as a variable length bytes data type. It is already possible to do this using netCDF4, by applying to each object in the arraynp.fromstring(pickle.dumps(obj), dtype=np.uint8)
, and saving these using a uint8 VLType. Then retrieving is simplypickle.reads(obj.tostring())
for each array.I know pickle can be a security problem, it can cause an problem if you try to save a numerical array that accidently has dtype=object (pandas gives a warning), and that this is probably quite slow (I think pandas pickles a list containing all the objects for speed), but it would be incredibly convenient.