Closed greenmna closed 1 year ago
Hello!
I think this is because the metabat output defines some alignment thresholds and then all the reads are failing them? These are described in the output, and can be changed by specifying more parameters on the command line.
That it?
Thank you for such a fast reply!
You are absolutely correct. After I went back to look at how MetaBAT "binned" all my contigs, it named the file as
I am guessing that the errors in my simulated nanopore data may be such that the prevalence of INDELs is ruining everything since MetaBAT doesn't factor those regions into coverage calculation. I'm betting if I change the --percentIdentity
argument to something below 97 I'd get my results to look more like what one would expect based on the mapping results prior. I went back and ran CoverM using the default -m mean
and it perfectly matched the reads mapped at 99.98%.
If I may ask one more question, is using either mean, metabat, or any other mode more preferable? Or does it ultimately come down to user preference/datasets?
Thank you very much, once again, for such a rapid replay!
No problem.
Try looking at or just using aviary? https://github.com/rhysnewell/aviary
-------------- Ben Woodcroft Group leader, Centre for Microbiome Research, QUT
From: greenmna @.> Sent: Tuesday, May 16, 2023 7:46:26 AM To: wwood/CoverM @.> Cc: Ben J Woodcroft @.>; Comment @.> Subject: Re: [wwood/CoverM] CoverM reports near 0 reads mapped despite almost all reads seemingly being mapped (Issue #165)
Thank you for such a fast reply!
You are absolutely correct. After I went back to look at how MetaBAT "binned" all my contigs, it named the file as _tooLowDepth.fa.
I am guessing that the errors in my simulated nanopore data may be such that the prevalence of INDELs is ruining everything since MetaBAT doesn't factor those regions into coverage calculation. I'm betting if I change the --percentIdentity argument to something below 97 I'd get my results to look more like what one would expect based on the mapping results prior. I went back and ran CoverM using the default -m mean and it perfectly matched the reads mapped at 99.98%.
If I may ask one more question, is using either mean, metabat, or any other mode more preferable? Or does it ultimately come down to user preference/datasets?
Thank you very much, once again, for such a rapid replay!
― Reply to this email directly, view it on GitHubhttps://github.com/wwood/CoverM/issues/165#issuecomment-1548637550, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAADX5H4FJC7EQRUXQ7QGULXGKP3FANCNFSM6AAAAAAYCOP444. You are receiving this because you commented.Message ID: @.***>
I will look into it!
Thanks again!
Aviary will give you both long and short read coverage files, but if you are only looking to get the coverages and not run the whole pipeline (and are feeling more experimental), you could also use the ISS-36 branch of rosella to just get coverage results and ignore the binning component for now and take coverages.tsv
file it produces. You can give it bam files or raw reads as well. The output will be in the metabat format, so it will be compatible with other binners but shouldn't throw away long read alignments
Greetings!
I have been using CoverM to calculate coverage from assemblies of simulated nanopore sequence data. These data were generated using the tools NanoSim and have simulated errors in them and variable abundance of the organisms that make it up.
I have also simulated datasets the same way without errors and have had no issue with CoverM reporting significant read mapping and subsequently giving coverage estimations.
Before using CoverM, I aligned my reads back to the assembly with minimap2 version 2.24. I then convert the output to a BAM file and sort that using samtools using 1.16. I check the % reads mapped and it was at 99.986%.
My question, or I guess plea for help, is why would CoverM think that almost no reads have mapped when the results should suggest otherwise?
The code I use for running CoverM is as follows
coverm contig -m metabat -b <sorted-alignment>.bam -p minimap2-ont -t 16 --min-covered-fraction 0 -o <coverage-file>.cov
Any help with potentially identifying where the issue might be in this would be much appreciated. I am not sure if all the information given is enough to figure out the problem (if it is with CoverM), but can provide more if needed.
I will also be including the results of
samtools stat
on the sorted BAM file that was the input to CoverMsamtools_stat.txt
Thank you and best regards!
Noah