openclimatefix / dagster-dags

Dags for running jobs on Leonardo
2 stars 0 forks source link

Ideas for running dagster in the cloud #49

Open peterdudfield opened 10 months ago

peterdudfield commented 10 months ago

When We don't have enough internet bandwidth it might be worth thinking can we run things in the cloud. Theres quite a few different options.

  1. Stick

    • wait it out, but prioritise things
    • no cloud costs
  2. Move ICON to use Planetary computers. This is free to run. Resource 3.2 TB RAM + 400 dask cluster. Can start up vms too.

    • good its free
    • bad its not in Dagster
    • might take time to get working
    • We can use this for any data we want to make public, as we can save straight to HF.
  3. Run in some jobs in Dagster cloud. https://dagster.io/

    • this could be expensive,
    • Might be annoying having 2 different Dagster runs.
    • Might be a good options to just run some smaller jobs on there. Smaller datasets like ECMWF can be stored in GCS.
  4. Move all to Dagster Cloud https://dagster.io/

    • Think this is too expensive,
    • and expensive to store the data somewhere
    • not very flexible
  5. Deploy our own instance Dagster on GCP.

    • might be complicated to do, but could take a while to get going.
    • can be scaled up and down.
  6. From local Dagster, trigger jobs on GCP.

    • https://docs.dagster.io/_apidocs/libraries/dagster-gcp
    • Not sure we can do it, but needs looking into. Ideal we could trigger off Cloud Run jobs and save data to GCS.
    • This could also be used for saving stuff to HF.
    • Might be expensive
    • Could use Dataproc to run jobs. Lots of different frameworks support
  7. Big another computer in different location

    • another computer to manage
    • need to communicate between machines, might need some set up. Large setup
jacobbieker commented 10 months ago

For 2, we can use kbatch to launch jobs, they run in a 32GB RAM VM by default, and can include anything (nwp-consumer, HF uploading, etc.) They have a 24 hour timelimit on the machines, so if jobs take longer than that, it won't work.

devsjc commented 10 months ago

Also we could still track it from dagster using the ExternalAsset resource if we wanted to!

peterdudfield commented 10 months ago

Also we could still track it from dagster using the ExternalAsset resource if we wanted to!

Does this mean we have the option to run it somewhere else, but then Dagster can just track if the file is there are the end?

peterdudfield commented 10 months ago

I'm begining to like 2 and 6 more. Means we have one mission control, but can scale things to different platforms as we need to

jacobbieker commented 10 months ago

For reference on Planetary Computer, this is basically the script I have that processes EUMETSAT data and saves to huggingface: https://github.com/jacobbieker/planetary-datasets/blob/main/planetary_datasets/conversion/zarr/eumetsat.py

jacobbieker commented 10 months ago

The only things in the PC version is steps that install Satip in the VM

peterdudfield commented 10 months ago

The only things in the PC version is steps that install Satip in the VM

what do you mean?

jacobbieker commented 10 months ago

I have a very slightly modified version of that script that just has this at the top of the file, to make it simpler to run the script there and include installing the missing dependencies in the VM. The VMs come with geospatial stuff already installed.

"""This gather and uses the global mosaic of geostationary satellites from NOAA on AWS"""
import subprocess

def install(package):
    subprocess.check_call(["/srv/conda/envs/notebook/bin/python", "-m", "pip", "install", package])

install("datasets")
install("satip")

"""Convert EUMETSAT raw imagery files to Zarr"""
try:
    import satip
except ImportError:
    print("Please install Satip to continue")