pangeo-data / pangeo

Pangeo website + discussion of general issues related to the project.
http://pangeo.io
701 stars 189 forks source link

Hello from Oxford! #276

Closed duncanwp closed 6 years ago

duncanwp commented 6 years ago

Hi all,

I met some of you at the recent ECMWF meeting but have just added myself to your website and wanted to breifly introduce myself!

I'm a post-doc in atmospheric physics working on aerosol-cloud interactions. I use a lot of observational data (satellite, aircraft, ship or anything I can get my hands on) to improve global model representations of these processes. So basically I do a lot of running big simulations and comparing them with disparate datasets.

In my spare time (ha!) I'm the lead developer of CIS which builds on Iris to allow easy collocation and comparison of these different datasets. I'm very hapy to discuss how / if it might fit into the pangeo landscape.

Otherwise I'm happy to help out however I can!

jhamman commented 6 years ago

Hello @duncanwp - thanks for the introduction. CIS has come across my radar a few times so its nice to see that there may be some ways to have it interact with Pangeo. A few questions:

  1. Have you been using Iris's dask integration for parallelization? More generally, what is your approach to parallelization for large/out-of-core computation?
  2. Is there interest in or work underway for using CIS in cloud environments?
duncanwp commented 6 years ago

Have you been using Iris's dask integration for parallelization? More generally, what is your approach to parallelization for large/out-of-core computation?

To be honest it's not something we've tackled up to now, most jobs I can trivially parallelise by file - I'm very exited to start making the most of the new dask backend in Iris though. I'm currently in the process of adjusting our data model to support exactly this.

Is there interest in or work underway for using CIS in cloud environments?

Yes, definitely. I have a prototype web interface for CIS (here) deployed on JASMIN but it currently just creates CIS command-line jobs and queus them on a processing cluster. This is primarily because the processing cluster is behind a firewall and can only be accessed through NorduGrid. In a public cloud though you don't have these limitations and once the dask support is in there should be much nicer ways of doing this.

There are three main aspects to CIS (which I may end up splitting into separate packages):

  1. Reading a variety of non-CF datasets into Iris Cubes (including MODIS L1/2/3, CALIOP L1/2/3, AeroNet, CloudSat, SEVIRI, NCAR-RAF, etc)
  2. Collocating them, including matching coordinates and units (which is where the value of using Iris over xarray currently comes), using either interpolation, regridding, binning or point-matching as appropriate
  3. A command line interface for those not familiar with Python, but also useful for batch processing.
stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 6 years ago

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.