Closed RenzoTale88 closed 3 years ago
Hi Andrea, thanks for noting this issue. I'm not sure what's causing this, so I'll have to run more tests on this feature. I'm afraid it'll be a few weeks before I'll have time to dig into debugging this, but if you'd like to send along minimal inputs that reproduce this error, I can debug based on that.
@WSDeWitt sure no problem. I'll create some shared folder to send you as soon as I can. Thanks for the help!
Andrea
@WSDeWitt sorry for never coming back with this. I've started looking into this since come across the issue one more on a separate dataset. I'm using the software installed in an anaconda environment with python v3.9.
The problem arises when the position of a variant comes after the latest interval provided in the bed file.
If I add the following code at line 107 of ancestor.py
to check the array:
try:
self._revcomp_func(chrom, pos)
except:
closest_idx = bisect.bisect(self.strandedness[chrom], (pos, -1))
sys.stderr.write(f'Variant: {chrom} {pos}\n')
sys.stderr.write(f'Index in array from bisect.bisect: {closest_idx}\n')
sys.stderr.write(f"Length of the strandedness array: {len(self.strandedness[chrom])}\n")
for i in range(-4, 0): sys.stderr.write(f"Index position {i}: {self.strandedness[chrom][i]}\n")
I can see the variant creating problem like this:
Variant: 18 65895671
Index in array from bisect.bisect: 15074
Length of the strandedness array: 15074
Index position -4: (65892711, 65892782)
Index position -3: (65893218, 65893402)
Index position -2: (65893751, 65894904)
Index position -1: (65895556, 65895639)
The problem using the index coming out of bisect.bisect as it is. The index from this command seems to be 1-based as pointed (here)[https://docs.python.org/3.9/library/bisect.html]. The problem is that it calls the wrong interval consistently for every variant in the dataset. For example, if I use a variant falling into the interval in position -2 (18, 65894668) I get the following output:
Variant: 18 65894667
Index in array from bisect.bisect: 15073
Length of the strandedness array: 15074
Closest index position 15073: (65895556, 65895639)
Real index position 15072: (65893751, 65894904)
The problem can be easily solved by changing the code as follow:
def _reverse_strand(self, chrom: str, pos: int):
r"""return True if strand_file indicates reverse complementation at
this site"""
closest_idx = bisect.bisect(self.strandedness[chrom], (pos, -1))
if pos < self.strandedness[chrom][closest_idx - 1][1]:
return True
return False
or again
def _reverse_strand(self, chrom: str, pos: int):
"""return True if strand_file indicates reverse complementation at
this site"""
closest_idx = bisect.bisect(self.strandedness[chrom], (pos, -1)) - 1
if pos < self.strandedness[chrom][closest_idx][1]:
return True
return False
Hope it helps Andrea
Hi Andrea, thank you for digging into this! I haven't had the time to investigate, but if you have a suggested fix (like the code above), I'd be happy to review a PR.
Ideally we'd also have a unit test for this in mutyper/tests. That's something I should have added before, and might have caught this before it became a problem for you.
@WSDeWitt sure I'll do a pull request later today. It would be great if you can test it on your data first though (I'm unsure whether the change I'm suggesting is correct)
Andrea
Hello, I'm trying to run mutyper on a series of custom VCF, and I'm coming across issues in specifying a bed file with positions for the negative strand exons. I got a file as follow:
Where the interval specify the positions of the negative stranded exons. However, when I pass it to mutyper, I get the following error:
Is there problem related to the bed file? Or a way to understand the offending position?
Thanks in advance Andrea