Open peterdudfield opened 10 months ago
For 2, we can use kbatch
to launch jobs, they run in a 32GB RAM VM by default, and can include anything (nwp-consumer, HF uploading, etc.) They have a 24 hour timelimit on the machines, so if jobs take longer than that, it won't work.
Also we could still track it from dagster using the ExternalAsset resource if we wanted to!
Also we could still track it from dagster using the ExternalAsset resource if we wanted to!
Does this mean we have the option to run it somewhere else, but then Dagster can just track if the file is there are the end?
I'm begining to like 2 and 6 more. Means we have one mission control, but can scale things to different platforms as we need to
For reference on Planetary Computer, this is basically the script I have that processes EUMETSAT data and saves to huggingface: https://github.com/jacobbieker/planetary-datasets/blob/main/planetary_datasets/conversion/zarr/eumetsat.py
The only things in the PC version is steps that install Satip in the VM
The only things in the PC version is steps that install Satip in the VM
what do you mean?
I have a very slightly modified version of that script that just has this at the top of the file, to make it simpler to run the script there and include installing the missing dependencies in the VM. The VMs come with geospatial stuff already installed.
"""This gather and uses the global mosaic of geostationary satellites from NOAA on AWS"""
import subprocess
def install(package):
subprocess.check_call(["/srv/conda/envs/notebook/bin/python", "-m", "pip", "install", package])
install("datasets")
install("satip")
"""Convert EUMETSAT raw imagery files to Zarr"""
try:
import satip
except ImportError:
print("Please install Satip to continue")
When We don't have enough internet bandwidth it might be worth thinking can we run things in the cloud. Theres quite a few different options.
Stick
Move ICON to use
Planetary computers
. This is free to run. Resource 3.2 TB RAM + 400 dask cluster. Can start up vms too.Run in some jobs in Dagster cloud. https://dagster.io/
Move all to Dagster Cloud https://dagster.io/
Deploy our own instance Dagster on GCP.
From local Dagster, trigger jobs on GCP.
Dataproc
to run jobs. Lots of different frameworks supportBig another computer in different location