Comparison with and discussion of alternative solutions

kjohnsen commented 6 months ago

There are a whole bunch of frameworks for organizing research code/computational pipelines, like redun, WDL, CWL, Nextflow, Snakemake, Reflow. I'm curious how this compares. https://insitro.github.io/redun/design.html#influences

frthjf commented 6 months ago

Thanks for your interest! It's a good question, and the documentation is far from being clear on this, so let me give a brief answer here and leave this issue open as a reminder to myself to improve the documentation.

machinable's focus is on providing a hackable user interface for scientific applications. I like to think of it as less of a workflow engine and more as a framework to build meaningful, intuitive 'wrappers' around complex applications. A bit like click but not just for CLI but Python/Jupyter land as well. As far as I can tell, it would make sense to use machinable to build an interface for the much more powerful redun/WDL/... pipelines. Why? Suppose we have a complicated pipeline with many options, what often ends up happening is a user interface like this:

python example.py \
    devices=8 \
    max_epochs=100 \
    data_train=mnist \
    data_val=['cifar'] \
    prepare@transform=cell \
    prepare@val_transform=cell
    normlization_mean=[0,0,0,0,0,0,0,0] \
    normalization_std=[1,1,1,1,1,1,1,1] \ 
    ... # and so on as complexity grows

Typing and editing becomes tedious quickly, so you often see this refactored into something like this:

python example.py --config ./configs/baseline.json

Better, but now it becomes tricky to manage configuration files (config/tuesday-baseline-run-02-second-try.json ...) and the configuration file becomes a user interface in itself.

Now, what machinable allows you to do is to build an interface for the example application with a small 'project specific language':

machinable get .example "~image_data(transform='cell')" "~norm(0,1)" max_epochs=100 devices="num_gpus()" --launch

The 'configuration file' then ends up just being a regular Python script.

from machinable import get

get('example', [
   "~image_data(transform='cell')", 
   "~norm(0,1)",
   {"max_epochs": 100, "devices": "num_gpus()"}
]).launch()

Since it's Python, you can do things that would be hard to do via the CLI or a config file:

from machinable import get

x = []
y = []

for num_epochs in [50, 100]:
   if experiment := get('example', [
       "~image_data(transform='cell')", 
       "~norm(0,1)", 
       {'max_epochs': num_epochs}
   ]).future():
      x.append(num_epochs)
      y.append(experiment.accuracy())

plot(x, y)

Overall, the rationale is to make interacting with the code easier, more self-documenting and less error prone.

I hope this gives you a vague idea of how machinable fits in the space; hopefully, I'll find some time to update the documentation but in the meantime let me know if you have more questions.

kjohnsen commented 1 month ago

Thanks, that's a lot clearer now! (sorry for the delay; I'm bad at checking GitHub notifications)

machinable-org / machinable

Comparison with and discussion of alternative solutions #535