rerun after deleting input files & server update

RvV1979 commented 2 months ago

Hi Lucas, In order to free space on our server I have deleted fastq input files of samples that had already run successfully. I now want to rerun the pipeline with some additional samples. I therefore added those to the samples.txt and tried a dry run. However, I get an error due to missing input files.

I suspect that this behavior may be due to the fact that our servers were updated and paths have changed (specifically, all paths started with /scratch and have changed to /lustre/scratch). I guess this may cause snakemake to consider everything as outdated and wants to re-run the entire pipeline. Could this be true and do you know if there is any way to solve this?

Thanks

lczech commented 2 months ago

Hi @RvV1979,

indeed, grenepipe checks that all fastq files are present before starting the run. That is on top of snakemake's own file checks, as I found that those do not always give clear error messages for that case. So, it could also be that your error is due to the path change that you describe, but it seems more likely that it's the grenepipe-internal check.

Either way, even if we deactivated that check, what you seem to want to do is to trick snakemake into ignoring the input files as their downstream files are already present, and so do not need to be recomputed, right? That is tricky to get right even if the original input files are still present. That usually involves telling snakemake to ignore time stamps etc. And with changed paths in the mix, it gets even trickier. So, if you really want to do this, you'd need to check what command line options snakemake offers to achieve that. Not sure what the best way would be here.

Alternatively, if you have the mapped files already, and just want variant calling, see this new option here. You could also run the pipeline on a set of new fastq files first up until the mapping, and then combined those newly mapped files with the ones that you have from your previous run. But all that of course depends on you having the bam files still.

Hope that helps for now, and let me know what you think Lucas

RvV1979 commented 2 months ago

Hi Lucas, Thanks for the fast response and advice. I still have the bam files so will take your suggested approach and am running the pipeline with all-bams options on new input files now. That new option is really helpful for re-running pipelines: a great feature! I do need to upgrade from 0.12.2 to 0.13.1 for it to be available, so hope that will not cause any unforeseen issues. I will keep you posted.

lczech commented 2 months ago

Hi @RvV1979,

all right, let me know how that works :-) The feature is new, so I hope it does not have any bugs. As for issues due to the updated pipeline: Yes, grenepipe v0.13.0 updated a lot of the tools to bring everything up to date again. So it could be that one of them behaves a bit differently now, or has slightly different file formats. So far, no one has reported any issues due to that though, and all tests are working fine, so let's hope it works for you as well. If not, let me know!

Cheers Lucas

RvV1979 commented 2 months ago

Hi Lucas, Just to update you: I tried upgrading and running from bam files but ran into slurm issues; see https://github.com/moiexpositoalonsolab/grenepipe/issues/54. In any case, I also saw that the pipeline would redo all calling steps because of code changes. This would entail re-doing rather many calculations. I therefore decided to try the original pipeline (v0.12.2) after commenting out the check for presence of all fastq input files. Luckily, this just worked: calling is only being done for the new samples and I am now at 53%. This is all FYI - no need for further support.

lczech commented 2 months ago

Hi @RvV1979,

okay, happy to hear that you found a solution for this! I think we can close the issue then? If not, feel free to re-open or start a new one :-)

Cheers Lucas

moiexpositoalonsolab / grenepipe

rerun after deleting input files & server update #53