Open rabernat opened 7 years ago
@rabernat do you maybe have example files that can be opened using open_mfdataset
, open_rasterio
, open_zarr
, and open_dataarray
? Preferably small files, that I'm allowed to add to recipy as test data.
Thanks for looking into this!
A good way to proceed would be to use the xarray tutorial datasets, which live in their own repository: https://github.com/pydata/xarray-data
These can be opened via the xarray.tutorial
module, as shown in the xarray docs: http://xarray.pydata.org/en/latest/examples/multidimensional-coords.html
Or you can just open them directly.
There are unfortunately no zarr or rasterio (i.e. geotiff) datasets there yet. I recommend you get started with netCDF files, which represent 95% of xarray use cases. In the meantime, I will work on providing examples of the other formats.
@rabernat Thanks! In the mean time, I found sufficiently small netcdf data for testing the patch of open_mfdataset
. For open_rasterio
, I just used a standard tiff file, since we are only interested in determining whether the input is logged and not whether it really is a geotiff.
So, now I'm just looking for zarr data.
I can prepare a zarr file for you.
When you say "small files", can you be more specific?
Note that zarr can read from a wide range of stores (see xarray docs and zarr docs). How important is it to cover all of these different cases?
Also, since zarr datasets can be opened directly by the zarr library, you might want to consider a dedicated patch for zarr (without xarray). I anticipate that zarr will grow in popularity over the coming years.
It would be great if you can prepare a zarr file! Small is kilobytes (the netcdf-files I found are 67kb). The data does not have to make sense, but must be valid.
I'll look into the different zarr storage types later. I'd like to cover as much file-based storage types as possible.
And correct me if I'm wrong, but all data loading/saving methods in xarray seems to come from other libraries (I started with a patch for netcdf4). So maybe we don't need a patch for xarray at all 😄 (I'll finish it anyway)
Is this waiting for me?
No, don't worry about it! The work of patching xarray is more or less done (if you could prepare a small zarr file for testing that would still be helpful). I plan to merge this feature soon, but there are some things that need to happen first.
This looks like a fantastic project with great potential to enhance scientific reproducibility. Thanks to the developers for all of your efforts.
I wanted to open an issue to suggest adding patch support for xarray https://github.com/pydata/xarray
Xarray is complementary to pandas and provides an interface for loading, analyzing, visualizing, and outputting labeled multi-dimensional array data. Its adoption is increasing rapidly in physical sciences, finance, and other fields.