xCDAT / xcdat

An extension of xarray for climate data analysis on structured grids.
https://xcdat.readthedocs.io/en/latest/
Apache License 2.0
111 stars 12 forks source link

[Doc]: are there some xcdat test files (that can be predownloaded) ? #277

Open jypeter opened 2 years ago

jypeter commented 2 years ago

Describe your documentation update

I wonder if there are xCDAT (or xarray) test files that can be (pre)downloaded and can be used for :

I'm thinking of (something like) the cdms2/vcs test data

I think these files are the ones listed in CDMS Sample Dataset and they are still online!

pochedls commented 2 years ago

I like this idea, but I'm wondering how this be implemented in a way that is easy to maintain. Perhaps we could add some functionality to directly download (e.g., from ESGF) example netCDF files (e.g., xcdat.get_test_data())?

I was curious about what xarray does – it seems like they generate toy data rather than providing data.

Should this be a discussion item?

jypeter commented 2 years ago

This is the up-to-date link for toy data you mentioned, but I'd rather have data coming from actual netCDF files than toy data generated in memory!

Some not-too-big test data files could come from ESGF, the way I've done it in #284, but we also need a way to get other static/known test data files:

I have just checked that cartopy mostly generates toy data on the fly for its examples, but iris uses a directory with actual data files (the way vcs and cdms2 did)

>>> import iris
>>> help(iris.sample_data_path)
sample_data_path(*path_to_join)
    Given the sample data resource, returns the full path to the file.

    .. note::

        This function is only for locating files in the iris sample data
        collection (installed separately from iris). It is not needed or
        appropriate for general file access.

>>> iris.sample_data_path("E1_north_america.nc")
'/home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/lib/python3.8/site-packages/iris_sample_data/sample_data/E1_north_america.nc'

ls -lh /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/lib/python3.8/site-packages/iris_sample_data/sample_data/
total 24M
-rw-rw-r-- 2 jypeter lsce 110K Jun 25  2020 A1B.2098.pp
-rw-rw-r-- 2 jypeter lsce 1.8M Jun 25  2020 A1B_north_america.nc
-rw-rw-r-- 2 jypeter lsce  28K Jun 25  2020 air_temp.pp
-rw-rw-r-- 2 jypeter lsce  34K Jun 25  2020 atlantic_profiles.nc
-rw-rw-r-- 2 jypeter lsce 3.5M Jun 25  2020 colpex.pp
-rw-rw-r-- 2 jypeter lsce 110K Jun 25  2020 E1.2098.pp
-rw-rw-r-- 2 jypeter lsce 1.8M Jun 25  2020 E1_north_america.nc
drwxr-xr-x 2 jypeter lsce 4.0K Sep 10  2021 GloSea4/
-rw-rw-r-- 2 jypeter lsce 662K Jun 25  2020 hybrid_height.nc
-rw-rw-r-- 2 jypeter lsce 7.5M Jun 25  2020 NAME_output.txt
drwxr-xr-x 2 jypeter lsce 4.0K Sep 10  2021 NEMO/
-rw-rw-r-- 2 jypeter lsce 2.0M Jun 25  2020 orca2_votemper.nc
-rw-rw-r-- 2 jypeter lsce 1.7M Jun 25  2020 ostia_monthly.nc
-rw-rw-r-- 2 jypeter lsce  26K Jun 25  2020 polar_stereo.grib2
-rw-rw-r-- 2 jypeter lsce 110K Jun 25  2020 pre-industrial.pp
-rw-rw-r-- 2 jypeter lsce  19K Jun 25  2020 rotated_pole.nc
-rw-rw-r-- 2 jypeter lsce 163K Jun 25  2020 SOI_Darwin.nc
-rw-rw-r-- 2 jypeter lsce 243K Jun 25  2020 space_weather.nc
-rw-rw-r-- 2 jypeter lsce 514K Jun 25  2020 toa_brightness_stereographic.nc
-rw-rw-r-- 2 jypeter lsce 3.3M Jun 25  2020 uk_hires.pp
drwxr-xr-x 2 jypeter lsce  12K Sep 10  2021 UM/
-rw-rw-r-- 2 jypeter lsce 2.4K Jun 25  2020 wind_speed_lake_victoria.pp
tomvothecoder commented 2 years ago

Thanks for this @jypeter. This has been discussed and was in-mind, although a GH issue was not opened for it.

I explored a possible implementation similar to xarray. xarray uses a GH repo (https://github.com/pydata/xarray-data) to host test datasets, and provides xarray.tutorial methods to open up the test datasets using a package called pooch.

We didn't pursue this idea since xarray supports direct download of data using OpenDAP. However, I think this idea is worthwhile because it standardizes and streamlines the testing processes with easy access to the same real-world datasets.

jypeter commented 2 years ago

Hmmm, I had a quick look at the pooch GH page. It looks really nice and fancy but:

Having a dedicated python package with just the data could also be an easy solution: e.g. basemap-data-hires

jypeter commented 2 years ago

Another data sample example from xoa

>>> import xoa

>>> xoa.show_data_samples()
gdp-6203641.csv hycom.gdp.u.nc hycom.gdp.v.nc hycom.gdp.h.nc croco.south-africa.surf.nc hycom.cfg croco.cfg gdp.cfg mercator.cfg argo.cfg croco.south-africa.zonal.nc croco.south-africa.meridional.nc ibi-argo-7900573.nc argo-7900573.nc

>>> xoa.get_data_sample('hycom.gdp.u.nc')
'/home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/lib/python3.8/site-packages/xoa/_samples/hycom.gdp.u.nc'

> du -sh /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/lib/python3.8/site-packages/xoa/_samples
1.1M    /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/lib/python3.8/site-packages/xoa/_samples

>ls -lh /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm_py3/lib/python3.8/site-packages/xoa/_samples
total 1.1M
-rw-rw-r-- 2 jypeter lsce  92K Feb 25 09:56 argo-7900573.nc
-rw-rw-r-- 2 jypeter lsce  305 Feb 25 09:56 argo.cfg
-rw-rw-r-- 2 jypeter lsce  714 Feb 25 09:56 croco.cfg
-rw-rw-r-- 2 jypeter lsce  61K Feb 25 09:56 croco.south-africa.meridional.nc
-rw-rw-r-- 2 jypeter lsce 190K Feb 25 09:56 croco.south-africa.surf.nc
-rw-rw-r-- 2 jypeter lsce  61K Feb 25 09:56 croco.south-africa.zonal.nc
-rw-rw-r-- 2 jypeter lsce  43K Feb 25 09:56 gdp-6203641.csv
-rw-rw-r-- 2 jypeter lsce   73 Feb 25 09:56 gdp.cfg
-rw-rw-r-- 2 jypeter lsce  487 Feb 25 09:56 hycom.cfg
-rw-rw-r-- 2 jypeter lsce 174K Feb 25 09:56 hycom.gdp.h.nc
-rw-rw-r-- 2 jypeter lsce 173K Feb 25 09:56 hycom.gdp.u.nc
-rw-rw-r-- 2 jypeter lsce 173K Feb 25 09:56 hycom.gdp.v.nc
-rw-rw-r-- 2 jypeter lsce  71K Feb 25 09:56 ibi-argo-7900573.nc
-rw-rw-r-- 2 jypeter lsce  195 Feb 25 09:56 mercator.cfg
durack1 commented 2 years ago

@tomvothecoder was there a plan to have a test suite with just the kind of (few timesteps) data that @jypeter was describing? It seems that CDAT was using the sample_data subdir which enabled testing in the CI envs, similar to what iris appears to do (https://github.com/xCDAT/xcdat/issues/277#issuecomment-1199068571 above)

jypeter commented 2 years ago

Note: see example usage of vcs.sample_data + '/tas_mo.nc' in https://github.com/xCDAT/xcdat/issues/310#issuecomment-1212866276

jypeter commented 9 months ago

I have added an Easy to use datasets section to my python page, with test/tutorials datasets from several packages

@tomvothecoder It seems that xarray uses xarray.tutorial.load_dataset. Maybe xcdat could have a similar xcdat.tutorial.load_dataset pointing to some useful sample CMIP6 data (and possibly the equivalent CMIP5 data, if somebody wants to make a CMIP5/CMIP6 comparison example)