Closed ekiefl closed 3 years ago
That is for documenting this. I see no problem with this fix, and I don't think there is any risk to break things :)
But I'm also surprised that this didn't occur until now. I wonder if it is due to a bug in the mapping software.
@SCarrSuperstar, can you share with us what did you use for read recruitment and whether there were some unusual parameters?
Best,
Thanks @ekiefl and @meren.
I think I know the problem. This may be a result of a collaborated effort. My collaborator actually made the co-assembly, and he didn't know that I already ran quality control on the files, so they did some extra QC screening. I'm now realizing that it is possible that my PE read files have sequences that won't map to the co-assembly which I used to make my contig.db file.
Thanks again so much for your time. I hope this maybe helps someone else, and gives me a reason to practice some Pysam. I'm going to try to remove these unmapped reads from the bam files I have so I don't have to start over.
Hi @SCarrSuperstar,
No problem, we are happy to help.
I'm going to try to remove these unmapped reads from the bam files I have so I don't have to start over.
If you install the version of anvi'o we plan to release in the next day or two, you won't have to remove these reads from the bam files or start over. anvi-profile
will run without error on your current contigs db and bam file.
That sounds even better!
Description
This error was reported by @SCarrSuperstar in this issue. Moving it here because it is a separate issue that deserves its own space. The error is:
Problem
I have identified the problem. Basically, a small percentage of reads from
pysam
are appearing without the critical attributereference_end
. According to the pysam documentation, "[reference_end] returns None if not available (read is unmapped or no cigar alignment present)".Well things certainly look like they are mapped and have a cigar alignment! Here is one such read that causes the error:
It has a CIGAR tuple and clearly maps to a specific point in the reference sequence. Yet, when I check the attribute
is_unmapped
, indeed, it returnsTrue
.I have no good explanation for this. There seems to be little rhyme or reason to which reads have the attribute
is_unmapped
set toTrue
. Here is a spattering of problem reads--maybe it will make sense to someone else:In total, 18,472 reads had
is_unmapped
set toTrue
while simultaneously having CIGAR tuples and therefore seemingly aligning to the references. This represents a very small percentage of the 10,549,203 total reads.Solution
The solution I see is to ignore any reads with
is_unmapped
set toTrue
.