Closed dongreenberg closed 1 year ago
Interesting approach using Ray and OpenTelemetry: https://composable-logs.github.io/composable-logs/home/
Some more context based on the above:
3 main ways of invoking/creating a run:
Logs
.rh
directory on a cluster will contain logs for each run, with each run having its own unique key (user should be able to overwrite that key name with their own)cluster_folder.to("here", path=local_path)
rh
folder of the projects main working directory Context Manager
Saving inputs & outputs for a Run
Basic API ideas (WIP):
Create Run object (captures logs, inputs, outputs, other artifacts read or written within call, who ran, where):
res = fn(**kwargs, name=”my_run”)
A run is a folder (created inside local rh directory by default), and can be sent elsewhere to persist logs, results, artifact info, etc.:
rh.run(name=“my_run”).to("s3", path="runhouse/nlp_team/bert_ft/results")
Ideally, we can have a "default log store" setting in the user config so the logs from their runs can be sent to the same place by default when they save, rather than having to send each run one by one.
This could be the way for users to configure for artifacts/logs to flow to an existing MLFlow store, or to flow to W&B, Grafana, Datadog, etc.
Save the run to local or RNS (not all runs need to be saved)
rh.run(name=“my_run”).save()
Creates a run object by tracing the activity within the block - no inputs and outputs, but captures logs (perhaps several logfiles for different calls) and artifacts used:
with rh.run(name=”my_run”) as r:
Big feature, essentially the same as auto-caching in orchestrators - check if this run was already completed, and load results if so, otherwise run:
res = fn.get_or_run(name=”yelp_review_preproc_test”)
Create/name a CLI run:
r = my_cluster.run(["python test_bert.py --gpus 4 --model distilbert"], name="test_distilbert_ddp")
Inspiration: this MLFlow example
We can also support event (failure or completion) notifications through knocknock or pagerduty!
Cc @caroline
From SyncLinear.com | KIT-67