--rerun-incomplete flag repeat mapping step for all samples

athenasyarifa commented 4 months ago

Hi!

I have been using grenepipe v0.12.2 for calling variants in 93 samples, and I had instances where merging paired bams step for one sample failed somehow in the middle of the run and everything else continues until 97% of all jobs (until qc after variant calling) done, except for that one sample. Then, I thought if I run again with the --rerun-incomplete flag in the snakemake command, it would repeat only for that sample, but grenepipe would repeat everything from the mapping step for all samples again. Do you have any suggestion for me if this happens again? I imagine my cluster administrator might not be happy with me using large resources repeatedly.

I attached for you my current snakemake-log, slurm-submission and slurm-debug log files, as well as the previous snakemake run log file (that ran until 97%). Any help would be appreciated! Many thanks!

2024-07-16T115815.581017.snakemake.log 2024-07-19T103454.572601.snakemake.log slurm-debug.log slurm-submissions.log

Best, Rifa

lczech commented 4 months ago

Hi Rifa,

hm, from the log files you sent I cannot tell what is going on there. You can try without the --rerun-incomplete flag - maybe snakemake thinks that some of the files are not complete, when in fact, they are. Also, add the --reason flag as well - this will print the reason for each job being executed. If you do that in combination with -n (dry run) you can just check what Snakemake wants to do, without actually using up resources. Maybe post the resulting file from that here as well, so that I can have a look.

As a general thing, usually, Snakemake does these repeated job executions due to some weird interaction of some of the input files being updated later on for some reason. For instance, it could be that somehow some of the reference genome index files got updated later on, and because those are inputs to the mapping, Snakemake then thinks "oh, updated files in the input, better re-generate the output", although the contents of those files might not have changed... That's what's usually behind these things.

If that is the case, you can try to manually re-set the time stamps of those reference genome index files. Its just a few files, so that should be not too much effort. That way, Snakemake will think that they are older than your mapped files, and will not try to update the mapping again. See here for how to do that.

If that is the reason, the files in question are

/dss/dsslegfs01/pr53da/pr53da-dss-0026/assemblies/Poecile_montanus_v1.0_12.07.2024/12.07.24_Poecile.montanus__genome.mitogenome__N2005.N2008__PacBioHiFi.Hi-C.Illumina__hifiasm.yahs.ragtag.mitofinder__v1.0.fasta.*

in your case. You can then probably just do

touch -d "2024-07-01T00:00:00" /dss/dsslegfs01/pr53da/pr53da-dss-0026/assemblies/Poecile_montanus_v1.0_12.07.2024/12.07.24_Poecile.montanus__genome.mitogenome__N2005.N2008__PacBioHiFi.Hi-C.Illumina__hifiasm.yahs.ragtag.mitofinder__v1.0.fasta.*

to reset all of them to first of July (or any other date that is older than your mapping run).

Also, as you are using grenepipe v0.12.2: I recently had users with issues due to conda having broken backwards compatibility. Did you get it to work still? Or when did you install it and used it for the first time (which is when the conda environments it needs are created)? Interesting that you still got this to work :-) There is a new version out now, which updates all conda-related things, as well as Snakemake. Maybe you could try that as well - it could be that the new version of Snakemake is smarter with respect to your file issue. However, that will definitely require at least one full re-run, as the complete output file structure has changed, so all files will have to be created again. But from then on it might be working better.

Finally, have you figured out what was wrong with that one mapping job in the first place? That of course needs to be fixed as well, so maybe check the log file logs/samtools/merge/merge-N_1413.log for that.

Hope that helps, so long Lucas

athenasyarifa commented 4 months ago

Hi Lucas,

Thanks so much for the help and suggestions! A few days ago I encountered another error with my grenepipe run, and I did as you suggested: (1) rerunning without the --rerun-incomplete flag, and (2) manually resetting the time for the reference genome. The run is going smoothly until now, hopefully I don't have to rerun again the whole thing.

Not completely sure why grenepipe v.0.12.2 is working for me, but my best guess is that I am using an older version of conda (I never bothered to update my conda although I should). I installed grenepipe v.0.12.2 on 8th of April if it helps you.

About the mapping, I figured out the problem, it was truncated somehow (maybe when I stopped the grenepipe in the middle of running or something?).

$ samtools quickcheck -v N_1413-1.sorted.bam
N_1413-1.sorted.bam was missing EOF block when one should be present.

But of course, after the rerun the file is now fine. I will close the issue, thanks again!

Best, Rifa

lczech commented 4 months ago

Hey Rifa,

thanks for the feedback, and happy to hear that things are working now!

Thanks so much for the help and suggestions! A few days ago I encountered another error with my grenepipe run, and I did as you suggested: (1) rerunning without the --rerun-incomplete flag, and (2) manually resetting the time for the reference genome. The run is going smoothly until now, hopefully I don't have to rerun again the whole thing.

Ah nice, glad that worked!

Not completely sure why grenepipe v.0.12.2 is working for me, but my best guess is that I am using an older version of conda (I never bothered to update my conda although I should). I installed grenepipe v.0.12.2 on 8th of April if it helps you.

I see, that makes sense. I think once you update that, you will have to use the newer versions though. The other way round, I don't know if you could use those already with your older conda version - I would not be surprise if conda breaks there somehow as well. But anyway, as you have a running system now, all good :-)

About the mapping, I figured out the problem, it was truncated somehow (maybe when I stopped the grenepipe in the middle of running or something?).

Makes sense. Could have just been some job in the cluster that failed for mysterious reasons, that happens sometimes. If you are curious, you might be able to trace this via the log files, but honestly, if a simple re-run has worked now, all good.

Cheers Lucas

athenasyarifa commented 3 months ago

Hi @lczech sorry to bother you again with this.

My grenepipe run had another error again with a few contig groups missing in few samples, below I attached the snakemake log file... I checked the logs of the haplotype caller for these samples, and they were truncated (attached one example). Not sure about these errors, but maybe because I had disabled the slurm-status.py a while back because my slurm administrators said that it was slowing down the slurm database.

2024-07-22T110323.749526.snakemake.log N_4843.contig-group-5.log

But then, I want to start again the grenepipe run so that it reruns these samples only. I ran the following first as you suggested:

touch -d "2024-07-01T00:00:00" /dss/dsslegfs01/pr53da/pr53da-dss-0026/assemblies/Poecile_montanus_v1.0_12.07.2024/12.07.24_Poecile.montanus__genome.mitogenome__N2005.N2008__PacBioHiFi.Hi-C.Illumina__hifiasm.yahs.ragtag.mitofinder__v1.0.fasta.*

touch -d "2024-07-01T00:00:00" /dss/dsslegfs01/pr53da/pr53da-dss-0026/assemblies/Poecile_montanus_v1.0_12.07.2024/12.07.24_Poecile.montanus__genome.mitogenome__N2005.N2008__PacBioHiFi.Hi-C.Illumina__hifiasm.yahs.ragtag.mitofinder__v1.0.fasta

and then tried with the -n dry run flag. Here I attached the dry run log, and it seems that grenepipe wants to repeat the variant calling (and generating bam index) again for all of the samples. Do you have an idea why? Should I also manually change the dates of the bam files?

dry_run_1.txt

How can I rerun grenepipe without starting the variant calling for all samples? Thank you so much in advance for your help, Lucas.

Cheers, Rifa

athenasyarifa commented 3 months ago

Hi Lucas and everyone, I solved the problem! I realized that grenepipe reruns variant calling for all samples because it reindex all of the bam files, and it does so because the last modified date of the bam files is the same with its index files. So I made a simple script to modify the dates of the bam files inside the dedup folder so that the date is 1 minute less than its index:

# samples contain a list of all samples
cat samples | while read sample
do
  fileminute=$(date -r ${sample}.bam.bai +%H%M)
  newminute=$(date -d"$fileminute - 1 minute" +"%H:%M")
  newdate=$(date -r ${sample}.bam.bai +%FT)$newminute:00
  touch -d "$newdate" ${sample}.bam
done

I tried the dry run again and it will rerun only the missing output files, see below. dry_run_2.txt

Re-closing this issue again, sorry for the trouble!

Cheers, Rifa

lczech commented 3 months ago

Hi Rifa @athenasyarifa,

haha, resolved before I got to answer - you are quick! Thank you though for posting your answer here - this might help others as well in the future!

So yes, as you figured out, this likely happens due to snakemake's way of using timestamps - checking those would have been my first suggestion as well. With the version of grenepipe/snakemake you are using --reason is a useful flag to see why things are happening. With the latest grenepipe/snakemake versions, this flag is active by default, so each rule always prints why it is being executed.

On top of that, snakemake has some other options to help you in situations like this. I am not sure which one was introduced in which version, so it could be that this is not yet available for your version. But still, thought I'd mention them. See here, in particular

--touch
--rerun-triggers
--list-changes

Hope that helps should you have similar issues in the future. But it seems you are on top of this :-)

Cheers Lucas

moiexpositoalonsolab / grenepipe

--rerun-incomplete flag repeat mapping step for all samples #50