pangeo-data / pangeo

Pangeo website + discussion of general issues related to the project.
http://pangeo.io
698 stars 188 forks source link

List of current Pangeo deployments #232

Closed jhamman closed 6 years ago

jhamman commented 6 years ago

We'd like to develop a current listing of Pangeo deployments. If you have deployed pangeo in one form or another, please speak up!

Information we'd like to know:

  1. Cloud or HPC (or other)? Which system (e.g. Google Cloud, NCAR's Cheyenne)?
  2. How are you deploying dask distributed (KubeCluster, dask-jobqueue, dask-mpi, etc.)?
  3. How are you deploying Jupyter (Jupyterhub, single user)
  4. Primary use (Tutorial, individual research, etc.)?

(I'll provide a few examples below, sharing a url to your jupyterhub deployment is not required).

xref #229


pangeo.pydata.org (Pangeo EarthCube)

  1. Platform: Google cloud
  2. Dask: KubeCluster
  3. Jupyter: JupyterHub
  4. Use: Exploratory deployment for Pangeo EarthCube project. Used for demos, tutorials, research.

pangeo-aws.cloudmaven.org (University of Washington) Note: this deployment will be taken down soon

  1. Platform: Amazon cloud
  2. Dask:KubeCluster
  3. Jupyter: JupyterHub
  4. Use: Demo deployment / proof of concept for proposal.

Cheyenne, Caldera, and Geyser (NCAR)

  1. Platform: HPC
  2. Dask: dask-jobqueue and dask-mpi
  3. Jupyter: Single user notebook servers (jupyterhub coming soon: #26 )
  4. Use: Individual research
jhamman commented 6 years ago

ping @jacobtomlinson @jreadey @TomAugspurger @tjcrone @jgerardsimcock

rabernat commented 6 years ago

NASA Pleiades Cluster

  1. Platform: HPC
  2. Dask: dask-mpi and custom slurm launch scripts
  3. Jupyter: single user notebook servers
  4. Use: Ongoing ocean and climate data analysis

Columbia Habanero Cluster

  1. Platform: HPC
  2. Dask: dask-mpi and custom slurm launch scripts
  3. Jupyter: single user notebook servers
  4. Use: Ongoing ocean and climate data analysis
tjcrone commented 6 years ago

Lamont Real-Time Earth Pangeo Cluster

  1. Platform: Azure cloud
  2. Dask: KubeCluster
  3. Jupyter: JupyterHub
  4. Use: Earth/ocean science research
guillaumeeb commented 6 years ago

HAL (CNES)

  1. Platform: HPC (PBSPro)
  2. Dask: dask-jobqueue and custom launch scripts
  3. Jupyter: Single user notebook servers (working on a Jupyterhub service)
  4. Use: Demo deployment / proof of concept
rsignell-usgs commented 6 years ago

Yeti Cluster (USGS)

  1. Platform: HPC (Slurm)
  2. Dask: dask-jobqueue and custom launch scripts
  3. Jupyter: Single user notebook servers
  4. Use: Analysis of coupled ocean, atmosphere, wave and sediment transport model output
jgerardsimcock commented 6 years ago

https://compute.rhg.com/hub/login

  1. Platform: GKE
  2. Dask: KubeCluster
  3. Jupyter: Single-user notebooks spawned from Jupyterhub
  4. Use: Climate Research and analysis
mrocklin commented 6 years ago

cc @TomAugspurger

On Wed, May 2, 2018 at 11:07 AM, J Gerard notifications@github.com wrote:

https://compute.rhg.com/hub/login

  1. Platform: GKE
  2. Dask: KubeCluster
  3. Jupyter: Single-user notebooks spawned from Jupyterhub
  4. Use: Climate Research and analysis

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/pangeo/issues/232#issuecomment-386010992, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszLkdsKpsE7aJ5leqY2susq9KTW0qks5tucutgaJpZM4TuSmT .

rabernat commented 6 years ago

@rsignell-usgs: didn't you and @jreadey win an Amazon award for AWS credits to deploy a pangeo cluster? Can you share these details? Is the cluster publicly accessible?

rsignell-usgs commented 6 years ago

@rabernat, yes we received the AWS award, but I'm a bit ashamed to say we don't have a Pangeo cluster running there yet. 😳

Hopefully this will change soon: @jacobtomlinson has agreed to help us try to deploy it live via screenshare Friday using EKS, and we will record the session in case it's useful to others.

If others want to join us live, that's fine also: Friday, May 4, 2018 (Time: 2:30pm BST, 9:30am EST, 6:30am PST) https://www.gotomeeting.com/join/533510693

rabernat commented 6 years ago

Considering the amount of data already in S3 (e.g. #234), it would be quite advantageous to have a general-use pangeo cluster on AWS. We are all happy to help.

rsignell-usgs commented 6 years ago

There is a lot of NOAA Big Data there, but a lot of it is simply NetCDF files parked on S3, which we then need to rewrite to Zarr or HSDS for efficient access: https://github.com/pangeo-data/pangeo/issues/234#issuecomment-386048381

rabernat commented 6 years ago

@rsignell-usgs - maybe it would be good if you could move that comment to #234, where we can discuss in more detail without taking this thread off topic.

jhamman commented 6 years ago

Those that have provided a summary of their pangeo deployment may be interested in #238. Feel free to comment on the description of your favorite cluster/cloud.

@jacobtomlinson and @TomAugspurger - okay if I add your deployments myself?

TomAugspurger commented 6 years ago

Feel free to add ours from https://github.com/dask/dask-tutorial-infrastructure, but it won't be long lived. Not sure if you're going for a list of currently active deployments.

On Thu, May 3, 2018 at 5:23 PM, Joe Hamman notifications@github.com wrote:

Those that have provided a summary of their pangeo deployment may be interested in #238 https://github.com/pangeo-data/pangeo/pull/238. Feel free to comment on the description of your favorite cluster/cloud.

@jacobtomlinson https://github.com/jacobtomlinson and @TomAugspurger https://github.com/TomAugspurger - okay if I add your deployments myself?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/pangeo/issues/232#issuecomment-386455164, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIvmIZr7PuDSDlsc37i0eC0EBGN0Sks5tu4N4gaJpZM4TuSmT .

rabernat commented 6 years ago

Wasn't there someone who came along a few months ago and said they used our setup guides to deploy on a university cluster? I have a distinct memory of this, but I can't find any record of the exchange.

jhamman commented 6 years ago

I merged the first iteration of this. I'll reopen and we can add more as they come available. https://pangeo-data.github.io/pangeo/deployments.html points to this issue for adding additional deployments.

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 6 years ago

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.