Open lmrodriguezr opened 3 months ago
Some (potentially) important additional information: I just noticed that the "missing coverage" is for a contig shorter than the limit I had set. I ran anvi-profile
with --min-contig-length 5000
, but this contig is 4,167 bp in length, so it shouldn't even be here. Any ideas on how it made it through?
I'm using this config file for anvi-run-eworkflow
: https://gist.github.com/lmrodriguezr/65cf33cb0ec5706a348d0189da87b63b
Thanks! Miguel.
Apologies for the frustration here, @lmrodriguezr. My general response for #2309 sadly applies here as well. But there is certainly a bug here given you mentioned this:
Some (potentially) important additional information: I just noticed that the "missing coverage" is for a contig shorter than the limit I had set. I ran
anvi-profile
with--min-contig-length 5000
, but this contig is 4,167 bp in length, so it shouldn't even be here. Any ideas on how it made it through?
I think the problem here likely stems from anvi-cluster-contigs
reading from the contigs-db rather than profile-db to figure out which contigs to report. That will always lead to an issue since you may have more contigs in contigs-db compared to the linked profile-db as a function of flags like --min-contig-length
that excludes some contigs from the profiled results.
One quick question: I presume there are many more contigs in the contigs-db that are shorter than 4,167, right? Because if that is the case, perhaps the bug is not coming from where I think it is :)
Short description of the problem
Hello again. I found a rather bizarre case in my testing and I'm a bit stuck now. I'm trying to run
anvi-cluster-contigs
with MaxBin2, and it's failing due to a single missing coverage value. I confirmed that the/tmp
files indeed are missing that value, and it's precisely the last one (last contig), which is very suspicious. Strangely enough, I only have this issue with one out of three samples.Any help is greatly appreciated!
anvi'o version
System info
OS: Rocky Linux 8.6 (Green Obsidian). Installed using the instructions for developer version.
Detailed description of the issue
I'm running a metagenome workflow using three samples, and only one of them causes the issue. The specific step is:
In the temporary folder:
As you can see, the
sequence_contigs.fa
has 34,990 sequences, but thecontig_coverages.txt
file only has values for 34,989 contigs (plus the header line). The one missing is the very last one:After a few minutes, the command above returns:
And the
logs.txt
file has:I checked using other drivers (e.g., CONCOCT or MetaBat2), and the same issue is present (i.e., one missing coverage value) but it simply doesn't fail because those programs silently ignore missing values (my guess).
Files / commands to reproduce the issue
Command
Files
The files are relatively big, so I'm making them available only temporarily here (apologies if you find this issue years from now and want to reproduce it!):
03_CONTIGS/d071_br05_S7-contigs.db
(927M)06_MERGED/d071_br05_S7/PROFILE.db
(1.6G)Thank you! Miguel.