kalekundert / stepwise

Modular, command-line scientific protocols
GNU General Public License v3.0
3 stars 0 forks source link

Represent protocol steps as a task graph #71

Open kalekundert opened 2 years ago

kalekundert commented 2 years ago

Currently stepwise represents the steps of a protocol as an ordered list. This is pretty intuitive, and good enough for most use cases, but it's not really right. Each step is really (i) an action that needs to be taken, which (ii) may depend on previous actions having already been taken. A DAG is the most proper data structure to represent relationships like this.

Here's a specific example of something useful that a DAG representation might enable. The transformation protocol calls for pre-warming plates, which takes ≈1h. If you have a protocol with a transformation and you're just reading the steps in order, by the time you get to the transformation step, you should've been pre-warming the plates already. A DAG representation would allow the transform protocol to specify two steps—"pre-warm plates" and "do the transformation"—and the relationship between them—"do the first an hour before the second". If the protocol were also to include PCR and ligation steps, and those steps were annotated with expected times (I would also need to distinguish between active and passive time), stepwise could figure out the best time to slot the pre-warming step into the protocol.

I'm not totally sure that this idea is worth the extra complexity, but I really like the idea of using a data structure that really reflects how protocol steps work.

kalekundert commented 2 years ago

It could be really nice if there were a way to get smartphone notifications when it was time to do the next step in a protocol. In particular, I'm currently doing an electroelution protocol where you need to run a gel for 1h, and start soaking the electroelution device 15 min before the gel ends. This is easy to forget, because it's necessarily preceded by 45 min of downtime.

In principle, if stepwise had a good understanding of the time relationships between steps, and if there were a form of sw go that sent the protocol to a server that was in contact with the user's phone, it would be possible to provide these notifications. Obviously this would be a lot of work, but I think it's something worth keeping in mind when making architectural decisions.

In order to keep the protocol in sync with real life, you'd need some way of telling the phone/server when you start/end a step. That might be prohibitively annoying...

kalekundert commented 2 years ago

In order to connect steps together with pipes, each protocol would need defined start/end steps. This is a fairly major (and artificial) restraint: it would prevent parallel steps from being specified on the command line/via shell scripts.

If I give up on using shell scripts as my scripting interface, I could maybe write some sort of DSL that makes it easier to specify these kinds of relationships. But that doesn't address interactive shell use. It would also be another thing to learn, and it would take a lot of effort to write something comparable to the shell in terms of features and performance.

I could maybe write some special commands (e.g. sw fork, sw wait) that connect protocols in non-standard ways. Here's how this could work:

Other names for sw fork:

Other names for sw wait

kalekundert commented 2 years ago

Another difficulty with this idea is that every step would need to specify a duration. This seems like it'd be an easy thing to omit, since it usually wouldn't matter. Some thoughts:

kalekundert commented 2 years ago

Another difficulty with this idea is that sometimes steps only make sense in the context of the previous step. For example, imagine two completely unrelated protocols. Currently I'd just print them out separately and do the at the same time. If I merged them together and printed out a single document, though, it might intermix all the steps in a very confusing manner.

One thing that might help with this is to tweak the topological sort to prefer direct children when possible.