Closed tarunaaggarwal closed 1 year ago
As I understand from the error message, your table does not comply with the following rule; Each item in the view data list must be a list or tuple with three items (item name, layer name, and data value). error line 43 If you have used less or more than 3 items, you need to fix this.
Hey @metehaansever, that is a bug on anvi'o side, there is nothing @tarunaaggarwal can do to fix it :) anvi'o profiling step generates a data product, passes it to another function that expects a certain format for every item, and the procedure fails because the data doesn't fit the expected structure.
@tarunaaggarwal, can you confirm that this error doesn't happen when you remove the flag --min-mean-coverage 5
?
Thank you for the report,
Hi @meren - thank you for responding! I'm rerunning the script now. It takes about a 1/1.5 days to run because I have lots of host contigs in my assembly, not just microbial (an unfortunate consequence of working with macroalgae). But I will write back as soon as it finishes. Thank you again for your help!
Thank you for your patience with anvi'o!
Dear @meren - I can confirm that removing --min-mean-coverage
did not result in any errors. Hope this information helps.
Here is a quick reproducible example with this contig.db and bam file: test.tar.gz
There is an error when both --min-mean-coverage
and -T
are used simultaneously:
anvi-profile -c CONTIGS.db -i SAMPLE-03.bam -o PROFILE --profile-SCVs -T 3 -S toto --min-mean-coverage 5
No error if you use one, or the other exclusively.
Strange: seems to work fine with -T 2
I can reproduce this and can see where the issue is. Will take care of it soon :)
OK. The more I think about this the more I am convinced that we must remove the --min-mean-coverage
flag altogether.
The reason to that is the following: in the vast majority of metagenomics projects, coverages of contigs will change from sample to sample. As a result, the --min-mean-coverage
flag will end up removing different sets of contigs from each single profile database... And when the user wants to merge those profile databases, they will get the following error:
Config Error: Ouch. The number of nucleotides described are not identical for all profiles to
be merged, which is a deal breaker. All profiles that are going to be merged
must be run with identical flags and parameters :/ You really shouldn't but if
you want to try to force things because you believe this is due to a
misunderstanding, you can use the flag --force. While you are considering this
as an option, please also remember that this we advice against it..
(note: the --force
works for merging, but then other downstream processes fail inevitably .. because data for some contigs are present in only a subset of resulting single profiles (this is a problem we could solve in theory, but in practice the necessary design will complicate things way too much in the codebase without any actual gain since I have a better solution for this)).
So. What is the solution when wants to remove contigs of bad coverage everywhere from the project altogehter?
A better solution is to run anvi-profile-blitz on all BAM files, remove contigs based on their coverages reported in the resulting bam-stats-txt file, create a final contigs-of-interest.txt
file from that, and pass it to the anvi-profile
with the flag --contigs-of-interest
.
This alternative works perfectly as one would expect, and is a much more explicit way to focus on a subset of contigs for profiling.
I hope this makes sense. I will now make the necessary changes to say goodbye to our good old --min-mean-coverage
flag.
Thank you very much for bringing this to our attention, @tarunaaggarwal. This flag worked well when we didn't have anvi-profile-blitz
for single profiles, but it clearly became obsolete over time as anvi'o evolved.
As a result, the --min-mean-coverage flag will end up removing different sets of contigs from each single profile database...
I thought this option would set the coverage to 0, but it actually filters out/remove contigs to consider in the profile.db?
How about moving that argument to anvi-merge
? You profile everything per sample, then for the final merged profile.db you can filter out contigs with min thresholds. Easier for the user than having to run the blitz profiler. Of course if you want to filter contigs for min-mean-coverage to reduce computing time, you should use the blitz profiler as you described it.
I thought this option would set the coverage to 0, but it actually filters out/remove contigs to consider in the profile.db?
Yes indeed.
How about moving that argument to anvi-merge? You profile everything per sample, then for the final merged profile.db you can filter out contigs with min thresholds.
This is a very good idea, but it will lead to many other issues since individual single profiles will not know which contigs they exclude, and it will create inconsistencies across profile databases that are linked to the same contigs-db in a manner that we can't track explicitly with the information stored in self
tables. I don't think that will make things any better.
Also, adding more structural burden on anvi-merge
is not a good solution in general. The user should explicitly define which contigs they wish to profile except those properties that are common across all samples (such as contig length) so everything is more reproducible.
But still, if there is too much protest we can add something to anvi-merge
:)
Hello - I'm trying to run
anvi-profile
with 6 bam files and acontigs.db
using the following command and got a similar looking error multiple times that I don't understand. My for loop is below and the error file is attached. Can someone please help me resolve these errors? Thank you so much!slurm-517415-err-node59.txt