Represent protocol steps as a task graph

kalekundert commented 2 years ago

Currently stepwise represents the steps of a protocol as an ordered list. This is pretty intuitive, and good enough for most use cases, but it's not really right. Each step is really (i) an action that needs to be taken, which (ii) may depend on previous actions having already been taken. A DAG is the most proper data structure to represent relationships like this.

Here's a specific example of something useful that a DAG representation might enable. The transformation protocol calls for pre-warming plates, which takes ≈1h. If you have a protocol with a transformation and you're just reading the steps in order, by the time you get to the transformation step, you should've been pre-warming the plates already. A DAG representation would allow the transform protocol to specify two steps—"pre-warm plates" and "do the transformation"—and the relationship between them—"do the first an hour before the second". If the protocol were also to include PCR and ligation steps, and those steps were annotated with expected times (I would also need to distinguish between active and passive time), stepwise could figure out the best time to slot the pre-warming step into the protocol.

I'm not totally sure that this idea is worth the extra complexity, but I really like the idea of using a data structure that really reflects how protocol steps work.

kalekundert commented 2 years ago

It could be really nice if there were a way to get smartphone notifications when it was time to do the next step in a protocol. In particular, I'm currently doing an electroelution protocol where you need to run a gel for 1h, and start soaking the electroelution device 15 min before the gel ends. This is easy to forget, because it's necessarily preceded by 45 min of downtime.

In principle, if stepwise had a good understanding of the time relationships between steps, and if there were a form of sw go that sent the protocol to a server that was in contact with the user's phone, it would be possible to provide these notifications. Obviously this would be a lot of work, but I think it's something worth keeping in mind when making architectural decisions.

In order to keep the protocol in sync with real life, you'd need some way of telling the phone/server when you start/end a step. That might be prohibitively annoying...

kalekundert commented 2 years ago

In order to connect steps together with pipes, each protocol would need defined start/end steps. This is a fairly major (and artificial) restraint: it would prevent parallel steps from being specified on the command line/via shell scripts.

If I give up on using shell scripts as my scripting interface, I could maybe write some sort of DSL that makes it easier to specify these kinds of relationships. But that doesn't address interactive shell use. It would also be another thing to learn, and it would take a lot of effort to write something comparable to the shell in terms of features and performance.

I could maybe write some special commands (e.g. sw fork, sw wait) that connect protocols in non-standard ways. Here's how this could work:

Every protocol would have designated "start" and "end" nodes. These could be determined automatically in most cases.
The usual pipe behavior would be to connect the start node of the downstream protocol to the end node up the upstream protocol.
When sw fork runs, it creates a protocol with a special end node.
- Whenever a protocol is attached to this node, the end node stays the same (instead of moving to the attached protocol's end node).
- This protocol can continue to linearly go through pipes, but it will build up a branched DAG.
- It might be useful to save references to all of the attached would-be end nodes.
- The user could use subprocesses to attach multiple linearly-piped protocols while a fork is active, e.g. sw fork | (sw A | sw B) | (sw C | sw D).
When the sw wait protocol runs, it deactivates the special end node created by sw fork. It might also add a step saying something like "Wait for all previous steps to complete.".

Other names for sw fork:

sw branch
sw parallel

Other names for sw wait

sw join

kalekundert commented 2 years ago

Another difficulty with this idea is that every step would need to specify a duration. This seems like it'd be an easy thing to omit, since it usually wouldn't matter. Some thoughts:

I could just require the user to specify a time for each step, i.e. raise an exception if a step doesn't have a time. This seems a bit surly though, since again this usually isn't that important.
I could assume that steps are instantaneous by default. This would avoid annoying users, but would surely make durations in general less trustworthy.
I need to distinguish between active time and passive time.

kalekundert commented 2 years ago

Another difficulty with this idea is that sometimes steps only make sense in the context of the previous step. For example, imagine two completely unrelated protocols. Currently I'd just print them out separately and do the at the same time. If I merged them together and printed out a single document, though, it might intermix all the steps in a very confusing manner.

One thing that might help with this is to tweak the topological sort to prefer direct children when possible.

kalekundert / stepwise

Represent protocol steps as a task graph #71