introduce snakemake checkpoint for samples with no data

sanjaynagi commented 1 year ago

Sometimes samples will have zero data, which makes the pipeline fail.

To handle this, the pipeline should start with a checkpoint which evaluates what samples actually have data and uses these to run the rest of the pipeline with.

sanjaynagi commented 1 year ago

an example of checkpoints can be found in this workflow https://github.com/anopheles-genomic-surveillance/selection-atlas/tree/hackathon-21-02-23

sanjaynagi commented 1 year ago

This is tricky. Just been exploring it.

Ideally, we would have a rule or checkpoint which initially checks if the all samples in the metadata have fastq files, either after bcl-conversion or at the start of the workflow if the user has fastqs already.

It would need to be a snakemake checkpoint, because we need to return values for wildcards in other bits of the analysis. However, checkpoints can make life a bit of a pain, and also we have that rule which splits the metadata samples into two if we have more than 1000 samples. I think this will be possible but more pain than its worth at this stage of development.

For now, Im leaving it, and we stipulate that there must be data or metadata for each sample in the metadata.tsv. If snakemake complains about a few samples, users can remove those samples from the metadata and restart the workflow.

ChabbyTMD commented 1 year ago

This is a fair compromise. We can add this caveat to the documentation.

sanjaynagi commented 1 year ago

Closing for now, as in reality even negative samples should have some reads.

sanjaynagi / AmpSeeker

introduce snakemake checkpoint for samples with no data #12