Open mdsumner opened 3 months ago
loose todo list
pull from ghcr.io without the docker hub intermediary?
singularity pull --dir $MYSOFTWARE/sif_lib docker:ghcr.io/osgeo/gdal:ubuntu-small-3.9.2
A bit of a summary that I emailed, for my own reference.
Script in R/oisst_daily.R has the new bucket way with bowerbird.
bowerbird now writing directly to Acacia and we have tested that along with programmatically making buckets publicly available.
Secrets can be passed as quoted names now, so we can put host and bucket user/pass secrets in a consistent framing. bowerbird will attempt to find an env var of that name first, and then try its value.
We’re particularly interested in the {mirai} framework for async evaluation, this works tightly with the R {targets} system (think “make for R”), and already has wrappers for slurm via the {crew.cluster} package:
I haven’t seen how to use it in anger for job scheduling yet on Pawsey, but there are examples around. If you have a chance to explore crew.cluster::crew_controller_slurm() and related functions that would be awesome, or if you can find others already using it on Pawsey that could help us a lot.
https://wlandau.github.io/crew.cluster/reference/crew_controller_slurm.html
All the deps needed are on this docker image:
module load singularity/4.1.0-slurm
singularity pull --dir $MYSOFTWARE/sif_lib docker:ghcr.io/mdsumner/gdal-builds:rocker-gdal-dev-python
For us, part of the puzzle crossing the divide from R to Python has included getting across VirtualiZarr, which is the successor to Python kerchunk, and is closely related to the NASA/Opendap system DMR++ that stores references to byte ranges and the encoding used of chunks in files like NetCDF/GRIB/HDF that themselves aren’t cloud-friendly, enabling them to be loaded as a Zarr store in xarray without any reformatiting or copying at all (indexed by a big json, or by a Parquet store which scales better). Creating kerchunk index collection descriptions for our object store will allow us to easily express what we have in R in an xarray context.
create the bowerbird data library with Setonix, as persistent storage in Acacia
set up persistent process of bowerbird synchronize configs regularly
accompanying file list and process to copy subset to scratch for intensive use, share public https links for external use, and tighter mounting for Setonix itself (open pawsey issue GS-28902 - can we mount buckets?)
devops for updating docker image used, triggered by a config change, self hosted actions etc