pangeo-data / pangeo

Pangeo website + discussion of general issues related to the project.
701 stars 189 forks source link

List Earth observation data you would use if it were on the cloud #588

Closed jhamman closed 5 years ago

jhamman commented 5 years ago

I've been recently asked to put together a list of Earth observation datasets that would be broadly useful to the Pangeo community and that are NOT currently available on a public cloud system like S3, GCS, or Blob. I'd like to encourage anyone who has science applications that uses Earth observation data to take a minute and register their thoughts here. I plan to share a collated version of this list with a number of data providers in the coming weeks.

A template for providing feedback here:

**Title: <title, e.g. MODIS MOD16 ET>**
Dataset owner: <entity, e.g. NASA>
Subset: <all, or specify desired subset>
Size: <estimate in TB are okay>
Data format: <existing data format, e.g. geotiff, netCDF, zarr, ect>
URL: <url>
scottyhq commented 5 years ago

Title: Landsat Analysis Ready Data (ARD) Dataset owner: USGS Subset: All (currently covers just US) Size: 10 Tb ? Data format: Cloud-Optimized Geotiff URL:

rabernat commented 5 years ago

Title: GLOBAL OCEAN GRIDDED L4 SEA SURFACE HEIGHTS AND DERIVED VARIABLES REPROCESSED Dataset owner: Copernicus Marine Environment Monitoring Service Size: ~500 GB Data format: netCDF4 (we also have a zarr copy in pangeo) URL:

Title: GHRSST Level 4 G1SST Global Foundation Sea Surface Temperature Analysis Dataset owner: NASA Subset: All Size: ? Data format: netCDF URL:

Title: Optimum Interpolation Sea Surface Temperature (OISST) Dataset owner: NOAA NCEI Subset: All, both AVHRR-Only and AVHRR+AMSR Size: 100 GB? Data format: netCDF URL:

(Note: there are an overwhelming number of SST products available. I am a professor of physical oceanography, and I have no idea which one is the "best"; there are tradeoffs involved. The most valuable ones to have in the cloud are the BIG datasets, like the ones listed here.)

Title: NOAA Climate Data Record (CDR) of Cloud Properties from AVHRR Pathfinder Atmospheres - Extended (PATMOS-x), Version 5.3 Owner: NOAA NCEI Subset: All Size: ? Data format: netCDF URL:

Title: NOAA Blended Sea Winds Owner: NOAA NCEI Subset: 6-hourly & daily Size: ? Data format: netCDF URL:

More soon.

rabernat commented 5 years ago

Is the scope here just satellite data, or can it be any "earth observations"? ARGO data?

rsignell-usgs commented 5 years ago

@jhamman and @scottyhq, is the Data format: field supposed to be the existing format, or the desired format on the Cloud?

jhamman commented 5 years ago

@rabernat - EO generally so ARGO would be fine. @rsignell-usgs - native format. The recommendation of cloud optimized formats can happen in a separate thread.

scottyhq commented 5 years ago

I defer to @jhamman who is overseeing this database, but I think there would be value in a few additional fields:

Type: (satellite, model, other)
Current Format: (hdf5, tif, etc.)
Desired Format: (zarr, cloud-optimized geotiff, etc.)
cgentemann commented 5 years ago

Jon & I have been talking about this. Right now, most helpful is to recommend data that resides on at a NASA DAAC. I've asked them for user stats (both ftp & opendap) which gives some guidance for us to start with. But.. this is forum is useful to help us rank them.

@rabernat Winds -- this is a better produce. but access is currently only through ftp & documentation is poor. but it is the 4dvar method, which produces much better winds than the gaussian interp used by the NOAA product. SST - yes, I totally agree - focus on BIG data. MUR SST is a great one from NASA. Also the VIIRS for even higher resolution. we have both these on the list already.


LejoFlores commented 5 years ago

Title: SMAP Enhanced L3 Radiometer Global Daily 9 km EASE-Grid Soil Moisture, Version 2 Dataset owner: NASA Subset: All Size: < 1 TB Data format: HDF5 URL:

rsignell-usgs commented 5 years ago

Title: MODIS-Aqua Ocean Color (Level 2) Dataset owner: NASA Subset: all Size: 10TB Current Data format: NetCDF Desired Cloud Data format: Zarr or GeoTIFF URL:**/*

robfatland commented 5 years ago

Note: GoLIVE is hosted by NSIDC but not officially part of the CMR/DAAC... I believe. A follow-on expansion called ITSLIVE should more properly be under the CMR umbrella; in progress.

Title: GoLIVE Land-ice velocity derived from LANDSAT-8 Dataset owner: NASA? Subset: all Size: unknown Current Data format: NetCDF Desired Cloud Data format: Zarr URL:

darothen commented 5 years ago

Title: Shuttle Radar Topography Mission (STRM) v4 Dataset owner: Consortium for Spatial Information (CGIAR-CSI) / NASA Subset: all (90m, 250m resampled) Size: 0.2 TB Data format: GeoTiff, ESRI ASCII URL:

dshean commented 5 years ago

Title: ArcticDEM and REMA 2-m DEM strips Dataset owner: Polar Geospatial Center (UMN) Subset: all (260714 ArcticDEM strips, 187585 REMA strips) Size: ~250 TB Data format: Float32 GeoTiff URL:

Title: HiMAT 8-m along-track and cross-track DEM strips Dataset owner: NASA Subset: all ~5K strips (v2 forthcoming with additional ~1.7K strips) Size: ~1-3 TB Data format: Float32 GeoTiff URL:

willirath commented 5 years ago

Title: GDP hourly drifter positions Dataset owner: Global Drifter Program Subset: all Size: approx. 10 GB Current Data format: NetCDF, ASCII, mat Desired Cloud Data format: Zarr URL:

willirath commented 5 years ago

Title: NSIDC Sea-Ice Concentration Dataset owner: NSIDC Subset: all Size: approx. 100 GB Current Data format: NetCDF Desired Cloud Data format: Zarr URL:

willirath commented 5 years ago

Title: GDP Drifter Climatoloty Dataset owner: Global Drifter Project Subset: all Size: approx. 5 GB Current Data format: NetCDF Desired Cloud Data format: Zarr URL:

willirath commented 5 years ago

Title: TMI SST Dataset owner: Remote Sensing Systems / PO.DAAC Subset: all Size: approx. 50 GB Current Data format: Custom Binary (?) Desired Cloud Data format: Zarr URL: and

willirath commented 5 years ago

Title: SRTM15+ and SRTM30+ global bathimetry Dataset owner: UCSD? Subset: all Size: approx. 10 GB + 2 GB Current Data format: NetCDF Desired Cloud Data format: Zarr URL: and

willirath commented 5 years ago

Title: TROPFLUX Dataset owner: Indian National Centre for Ocean Information Services Subset: all Size: approx. approx. 50 GB Current Data format: NetCDF Desired Cloud Data format: Zarr URL:

willirath commented 5 years ago

Title: Argo Mixed Layers Dataset owner: UCSD Subset: all Size: approx. 1 GB Current Data format: NetCDF Desired Cloud Data format: Zarr URL:

willirath commented 5 years ago

Title: WORLD OCEAN ATLAS 2013 version 2 Dataset owner: NOAA? Subset: all (1.00 deg and 5.00 deg) Size: approx. 200 GB Current Data format: NetCDF Desired Cloud Data format: Zarr URL:

willirath commented 5 years ago

Title: HadISST Sea-Surface Temperature and Ice Coverage Dataset owner: UK Metoffice Subset: all Size: approx. 4 GB Current Data format: NetCDF Desired Cloud Data format: Zarr URL:

willirath commented 5 years ago

Title: ETOPO1 Global Relief Model Dataset owner: NOAA? Subset: all Size: approx. 4 GB Current Data format: NetCDF Desired Cloud Data format: Zarr URL:

apatlpo commented 5 years ago

About World Ocean Atlas, a 2018 version is now available:

rabernat commented 5 years ago

In my experience, low resolution datasets like WOA, HadISST, etc, work fine already over OpenDAP. It's only when you get into the > 10 GB range that OpenDAP starts to struggle and cloud storage becomes advantageous.

rbavery commented 5 years ago

I second @scottyhq that Landsat ARD would be really valuable to have on the cloud. Just wanted to mention this cite that indicates the total size of the record is much larger than 10 Tb

"Each day of Sentinel-2 data collection will result in 1.6 TB of imagery, for each satellite, in comparison to 750 GB per day for Landsat-8, 260 GB for Landsat-7, and, for historical reference, 40 GB for Landsat-5 (Wulder et al., 2008)." - Wulder et al. 2015

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 5 years ago

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.