pangeo-data / storage-benchmarks

testing performance of different storage layers
Apache License 2.0
12 stars 1 forks source link

What to use for test dataset? #1

Closed jreadey closed 6 years ago

jreadey commented 6 years ago

Thoughts:

jhamman commented 6 years ago

@rabernat has suggested the GHRSST dataset.

I'll list a few other options:

rabernat commented 6 years ago

An example I really like is from Copernicus:

GLOBAL OCEAN GRIDDED L4 SEA SURFACE HEIGHTS AND DERIVED VARIABLES REPROCESSED (1993-ONGOING)

http://marine.copernicus.eu/services-portfolio/access-to-products/?option=com_csw&view=details&product_id=SEALEVEL_GLO_PHY_L4_REP_OBSERVATIONS_008_047

Here is a notebook showing how to push it to zarr on GCS. However, this is not already on S3, so that is a downside

https://nbviewer.jupyter.org/gist/rabernat/311faea2695370feb50ef51e6c0f0d22

rabernat commented 6 years ago

So :+1: to the LOCA dataset.

kaipak commented 6 years ago

@jreadey provided LOCA dataset on S3 so I think we can call this issue closed. Will reopen if anyone has objections.