Open kjohnsen opened 6 months ago
Thanks for your interest! It's a good question, and the documentation is far from being clear on this, so let me give a brief answer here and leave this issue open as a reminder to myself to improve the documentation.
machinable's focus is on providing a hackable user interface for scientific applications. I like to think of it as less of a workflow engine and more as a framework to build meaningful, intuitive 'wrappers' around complex applications. A bit like click but not just for CLI but Python/Jupyter land as well. As far as I can tell, it would make sense to use machinable to build an interface for the much more powerful redun
/WDL
/...
pipelines. Why? Suppose we have a complicated pipeline with many options, what often ends up happening is a user interface like this:
python example.py \
devices=8 \
max_epochs=100 \
data_train=mnist \
data_val=['cifar'] \
prepare@transform=cell \
prepare@val_transform=cell
normlization_mean=[0,0,0,0,0,0,0,0] \
normalization_std=[1,1,1,1,1,1,1,1] \
... # and so on as complexity grows
Typing and editing becomes tedious quickly, so you often see this refactored into something like this:
python example.py --config ./configs/baseline.json
Better, but now it becomes tricky to manage configuration files (config/tuesday-baseline-run-02-second-try.json
...) and the configuration file becomes a user interface in itself.
Now, what machinable allows you to do is to build an interface for the example application with a small 'project specific language':
machinable get .example "~image_data(transform='cell')" "~norm(0,1)" max_epochs=100 devices="num_gpus()" --launch
The 'configuration file' then ends up just being a regular Python script.
from machinable import get
get('example', [
"~image_data(transform='cell')",
"~norm(0,1)",
{"max_epochs": 100, "devices": "num_gpus()"}
]).launch()
Since it's Python, you can do things that would be hard to do via the CLI or a config file:
from machinable import get
x = []
y = []
for num_epochs in [50, 100]:
if experiment := get('example', [
"~image_data(transform='cell')",
"~norm(0,1)",
{'max_epochs': num_epochs}
]).future():
x.append(num_epochs)
y.append(experiment.accuracy())
plot(x, y)
Overall, the rationale is to make interacting with the code easier, more self-documenting and less error prone.
I hope this gives you a vague idea of how machinable fits in the space; hopefully, I'll find some time to update the documentation but in the meantime let me know if you have more questions.
Thanks, that's a lot clearer now! (sorry for the delay; I'm bad at checking GitHub notifications)
There are a whole bunch of frameworks for organizing research code/computational pipelines, like redun, WDL, CWL, Nextflow, Snakemake, Reflow. I'm curious how this compares. https://insitro.github.io/redun/design.html#influences