pangeo-data / pangeo

Pangeo website + discussion of general issues related to the project.
http://pangeo.io
701 stars 189 forks source link

ECMWF / Copernicus Climate Data Store #40

Closed rabernat closed 6 years ago

rabernat commented 6 years ago

I just read an article about a new "climate data store" that is being developed by ECMWF

https://www.ecmwf.int/en/newsletter/151/meteorology/climate-service-develops-user-friendly-data-store

This looks quite ambitious and very complex:

schematic

Despite the highly customized architecture, there is an explicit mention of open-source and even xarray:

It was also decided that the CDS should be based on open source software where possible, so that other instances could be deployed if necessary. This is particularly important for the development of the toolbox: there is a vibrant community developing scientific libraries in Python, such as Numpy, Scipy, Pandas, xarray, dask, matplotlib etc. These libraries provide many of the algorithms required, and users from the weather and climate communities are already familiar with them. Making use of those libraries will therefore make it easier for users to contribute new additions to the toolbox.

We should keep this on our radar. Do any of the euro folks (e.g. @lesommer) have any connections to this group? It would be great to develop connections with ECMWF, as they are one of the largest providers of weather and climate data in the world.

mrocklin commented 6 years ago

cc @pelson

rabernat commented 6 years ago

One very active python person from ECMWF is @kynan.

shoyer commented 6 years ago

cc @alexamici

shoyer commented 6 years ago

I just left the ECMWF python workshop. ECMWF seems to be building/adapting many tools to use Python with Xarray/Dask, including:

They do seem to be a little new to open source, and none of these tools are actually public yet. I encouraged them to get involved in Pangeo and the broader community.

darothen commented 6 years ago

a new GRIB reader based on eccodes (which they want to use as a new backend for xarray)

@shoyer do you have any specific links or details to this effort? A good alternative to PyNIO for reading GRIB/GRIB2 files into xarray is a "killer feature" which opens up the tool to broader community of researchers working in numerical weather prediction, where GRIB2 is the standard for dissemination of large-scale forecast model output from many NDCs.

shoyer commented 6 years ago

@alexamici is leading these efforts for ECMWF. I don't think they have much to share publicly yet but they hope to it open source it. I encouraged him to add the xarray specific backend logic into xarray proper so we can more easily maintain it.

chiaral commented 6 years ago

I am very late to this issue, but very interested in learning about the evolution of the GRIB2 reader status in xarray.

I just started using xarray+PyNIO with open_mfdataset() with some pre-processing function, and some looping to do multiple dimensions concatenation (as explained by @jhamman in this SO answer).

In theory it works great. But i am getting into some issues.

Does anyone have some experience on this?

StephanSiemen commented 6 years ago

FYI, we plan to setup a call soon to present the CDS and our work on xarray_grib. Hopefully we can address your questions/comments then - see #302 .

rabernat commented 6 years ago

FYI, the Copernicus portal has been released. It's open to the public: https://cds.climate.copernicus.eu/

It's pretty cool! You can run python code in their environment, kind of like a notebook. Just a lot of new / unfamiliar apis.

shoyer commented 6 years ago

It looks like the cdxtoolbox library has a bunch of routines for climate data analysis on xarray.DataArray objects, including routines that keep track of units: https://devpi.copernicus-climate.eu//root/master/cdstoolbox/latest/+doc/index.html

This looks pretty cool and potentially broadly useful! Is it available outside of Copernicus as a stand-alone library and/or open source project? I think a lot of folks would be excited about this.

alexamici commented 6 years ago

@shoyer I can give you some background as @bopen is leading the development of the CDS Toolbox.

The Toolbox is a distributed architecture, so things are not straightforward. The cdstoolbox module that you import in the applications only define the work to be done as an abstract workflow, then actual processing is done by a bunch of different tools and libraries on separate compute hosts.

On the other hand it is true that quite of bit of the tools are written in python with xarray.DataArray as the main processing data structure and @ecmwf (our contractor) intended to Open Source the code since the beginning. Unfortunately, in spite of the best intentions of @ecmwf the legal team didn't clear us yet :/

rabernat commented 6 years ago

The cdstoolbox module that you import in the applications only define the work to be done as an abstract workflow, then actual processing is done by a bunch of different tools and libraries on separate compute hosts.

Hmmm...sounds a bit like an obscure library for parallel computing in python that I've been playing around with. 😄

In all seriousness, I totally agree that many of these routines would have broad interest from the community.

@shoyer - you should consider joining the Pangeo telecon with ECMFW (discussed in #302)

fmaussion commented 6 years ago

I've been playing around with the CDS at last week-end's Hackathon together with other programmers. I would say that the CDS is a great tool but still has some rough edges.

Pros:

Cons:

I fully understand the challenges behind the CDS @alexamici and this is not meant to be a critic - I'm looking forward to use the CDS more and more - for some use case I'm going to have to keep using MARS though.

fmaussion commented 6 years ago

Regarding open-sourcing the climate toolbox part: I think it would be very nice to open-source the science part of the toolbox (e.g. climate indices) in order to engage confidence in the results that the CDS is producing.

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 6 years ago

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.

cpaulik commented 5 years ago

Sorry for re-opening this but I would also really like to see this part of the CDS as open source. Especially since I would not really produce anything meaningful on the CDS since their data licence states the following:

6.2. All Intellectual Property Rights of new items created as a result of
modifying or adapting the Copernicus Products through the applications and
workflows accessible on the ECMWF Copernicus portals will belong to the European
Union
jhamman commented 5 years ago

@cpaulik - I agree. This sort of language in a licence is not particularly welcoming. I wonder if @StephanSiemen could help provide some clarity on the intension here or the most productive venue for providing feedback on the CDS licence.