Closed bensprung closed 2 years ago
Hi Ben,
interesting. There are several things that might be off here.
Job needs threads=5 but only threads=3 are available. This is likely because two jobs are connected via a pipe and have to run simultaneously. Consider providing more resources (e.g. via --cores).
You might want to try using more cores, to resolve this warning. That is however probably not the cause here.
The problem here seems to be with samtools stats, wich is a quality control tool that is independent of freebayes and GATK HaplotypeCaller, and hence can be executed by the pipeline whenever - it might hence be coincidence that it fails before freebayes is even started, and that you might run into the same problem with HaplotypeCaller as well. Hence: Did the same configuration with just changing the calling tool work for you? Was that with the exact same grenepipe version?
One thing that might help: Could you please post the samtool stats log file here? It should be at logs/samtools-stats/111D03-1.log
.
Cheers and so long Lucas
I'm actually running on a pretty old 4-core machine so I only allocate 3 cores, but yes--it works fine with HaplotypeCaller set instead of freebayes and everything else exactly the same. Unfortunately, the log file logs/samtools-stats/111D03-1.log
is not actually created so I can't post it (you can see it complaining that it doesn't exist in the error message above). The logs/samtools-stats
directory exists but there are no files in it.
Interestingly, for the runs where HaplotypeCaller is used (with success), that samtools-stats log file exists but it's totally empty.
I'm back from other projects trying to troubleshoot this. Is there a way, for a particular config.yaml
, to get grenepipe to produce a list of the commands that it is going to do, without actually running them? I was hoping to run each step by hand as it were to try to get a more precise sense of what the issue is.
Also, if I allocate another core, I just get Job needs threads=6 but only threads=4 are available.
Hi @bensprung,
thanks for digging into this!
Is there a way, for a particular config.yaml, to get grenepipe to produce a list of the commands that it is going to do, without actually running them?
Kind of. Snakemake offers the option --dry-run
to list all rules that are going to be executed, see here. This will give you the tools and their input and output files, but you will have to somehow cobble together the actual command lines to execute. I don't think there is another way, as the construction of command lines and their subsequent execution are part of scripts that are just executed as a whole by snakemake.
However, I am not entirely sure that this is necessary. According to your above log, you can see which tools fail, can you not? So you'd only need to execute the failing ones by hand, I think.
Also, if I allocate another core, I just get Job needs threads=6 but only threads=4 are available.
As for that, yes, I see. I am developing on my 8 core (16 with hyperthreading) laptop, and the pipeline is mostly geared towards even larger systems such as clusters. Hence, I did not optimize it for smaller laptops. However, a change of the pipeline to work without warnings on 4 cores or fewer does not make much sense to me: Such a change would make it slower on larger systems and datasets. And for small datasets that can be run on your laptop, it does not matter much anyway - you should be able to just use --cores 6
. This will of course oversubscribe your cores, and so your laptop will be slow while running, but it should work. For larger datasets where this is inconvenient, I would suggest to use a larger machine or cluster anyway.
Let me know if that helped or if you need any further input for now! Worst case, send me your data, and I can help debugging.
Cheers and a happy holiday season! Lucas
Well I can't really tell tbh. It seems like samtools-stats
is failing. I will look at --dry-run
.
The strange thing is I can run with 1 core with the default tools with no errors. But it's only changing the caller to freebayes that creates this error--but it's very early in the pipeline. Seems weird?
That is indeed weird. The core issue should maximally lead to snakemake complaining or failing. But these errors seem to come from the tools being run, and not from snakemake itself... I did have issues in the past where one tool complained, but another was at fault, by producing erroneous or empty output files. You could check that the files that samtool stats wants to use (e.g., dedup/111D03-1.bam
) are correct.
If that does not help - would you mind sharing your data or part of it with me?
Hrm, I don't think it's a problem with the bam file, because it turns out (surprisingly) that I get the same error with --dry-run
, but only if calling-tool
is set to freebayes
. If it is set to haplotypecaller
or bcftools
it completes without issue. I'll attach the output of --dry-run
for all three.
Happy to send the data if you think it makes sense.
Oh interesting, that error is indeed simply caused by too few cores. I thought snakemake would handle that differently, sorry for that. As said above, just run it with --cores 6
- that should work, but make your computer slow while the pipeline is running. As I would not recommend running the pipeline for any large dataset on a laptop anyway, that should not be a limitation, and hence suffice for testing ;-)
Ok will try that. Any idea why it only happens with freebayes as the caller?
Yes, because the freebayes rules are implemented to use more cores by default, see the config file. You can change this setting as well (instead of changing --cores
) to help with the issue.
Got it. So, changing threads: 8
for freebays in the config file didn't yield a completed run (I tried ramping it down all the way to 1 but still continued to get various odd errors) but running snakemake with --cores 8
(which I didn't think I could do with only 4 physical cores) worked. Thank you!
Ah nice, glad to hear it worked out now! Closing the issue now, but feel free to re-open if needed.
Some things remain though:
I tried ramping it down all the way to 1 but still continued to get various odd errors
Hm, what exactly happened there?
which I didn't think I could do with only 4 physical cores
Ah yes, it's possible to over-subscribe your cores. As said, that will make your computer slow for a while, but absolutely okay to do, technically speaking.
Hi Lucas, got a weird one for you. If I change the caller from
hapotypecaller
tofreebayes
, I get the error below. It's doubly strange because it seems to occur well beforefreebayes
would be used in the pipeline.