khanlab / snakebids

Snakemake + BIDS
https://snakebids.readthedocs.io
MIT License
17 stars 14 forks source link

Rerunning Apps #182

Open pvandyken opened 2 years ago

pvandyken commented 2 years ago

The problem

I don't know what the cleanest way of implementing this in a BidsApp is, but it seems it would be useful to have a "rerun" capability a la datalad rerun. Given pretty much everything is already saved in the config file created in the output dir, one should be able to supply run.py with just the output dir and run everything again the same way. An additional command line --args would override previous ones.

In the same vein, a utility command that extracts and prints previous run settings might be useful.

Use cases might include running an app with different settings after a long period of time, ensuring settings you don't explicitly change (including the input bids dir, which is not always obvious) remain the same. Or generating a report, etc.

kaitj commented 2 years ago

+1 - would have to think about this a little more in terms of implementation, but like the idea of not having to resupply arguments if possible - can we just pass the config in this case? Another consideration would be if you're supplying new args to override previous ones (I can see how this can be useful), can it still be considered a rerun, or should it be a new run?

One note, unless this has changed recently, for it to be considered a BIDS app, the required arguments are bids_dir, output_dir, and analysis_level (https://bids-apps.neuroimaging.io/dev_faq/). It might be supplying those inputs with a --rerun flag or something similar. Is possible to force a run via Snakemake args using the existing config?

pvandyken commented 2 years ago

Re required arguments, I was hoping there would be some latitude here... To me, if you have to supply the input dir, that goes against the point of a rerun. What if I forget the input dir?

But standards are standards. We've talked before about how to incorporate other "utility" features into a snakebids app. Perhaps it's time to revisit this?

tkkuehn commented 2 years ago

I'm of two minds on this proposal. It seems a little beyond the scope of what Snakebids is supposed to do (i.e. I think this kind of use of provenance information is better suited to something like DataLad), and potentially brittle (if after a long time the input dataset is no longer in its original location Snakebids has no way of knowing where it might be). I also worry about encouraging users to run the same workflow with different settings to a pre-populated output directory. I think Snakemake will mostly replace the correct (i.e. outdated) files, but if the workflow or the settings have changed significantly, you could easily end up with an unclear mix of irrelevant old output files and relevant new output files (it is possible that there are Snakemake features that handle this kind of thing and I'm not aware of them, though).

That said, the requisite provenance information is there, and it would be nice to make it easily usable. I think something like the suggested "utility command that extracts and prints previous run settings," maybe as part of the snakebids interface, might by itself be enough to be useful without running into the API or other issues.

I think if running your app with the BIDS app API always does the expected thing, an interface like Peter proposed isn't necessarily prohibited. However, an (awkward) alternative might be to allow something like run.py - {output_dir} -.

pvandyken commented 1 year ago

Just an idea for another way to approach this: we could do something like

snakebids rerun /path/to/output [arg overrides]

So in this world, we'll assume snakebids is available in the current python environment, or that it's pipx installed. It will use the local snakemake to rerun the app using the same specs found in the config file in the output. If we save pipeline and snakemake versions (as proposed in #205), we could throw errors if any versions have changed (using flags to override).

By default, this would be primarily for devs (who have snakebids installed). But we can make an API so that it can be incorporated into apps using whatever CLI is desired. So it wouldn't necessarily be baked into every bids app (and wouldn't violate any specs), but would be possible for devs and app makers who want to expose the functionality.

akhanf commented 1 year ago

I like that approach, would fit nicely with the way I am running apps right now.

To be clear, it could be used to rerun an app that has since had subjects added to the dataset, right? (Ie it will generate inputs again?).

pvandyken commented 1 year ago

Yeah, so it would do the following:

  1. Check the pipeline version and snakemake version previously used and compare with the present. Error out if they differ, unless the user specifies to ignore versions (via some flag)
  2. Re-create the CLI call the user made
  3. Apply as patches any new args the user provides. Some rational method will be needed to both add and remove previous args. The simplest would just be a --snakemake-args ... argument similar to --pip-args in pipx, and it would do a clean override.
  4. Call the reconstructed command.