lindenb / jvarkit

Java utilities for Bioinformatics
https://jvarkit.readthedocs.io/
Other
478 stars 132 forks source link

findallcoverageatposition output incorrect numbers #180

Open yliueagle opened 3 years ago

yliueagle commented 3 years ago

Subject of the issue

findallcoverageatposition output incorrect numbers

Your environment

Steps to reproduce

echo "hic_reads_trans_chr14_chr18_chr18_2021-03-0318:15:51.bam" | java -jar ./jvarkit/dist/findallcoverageatposition.jar -p chr18:7833720

Expected behaviour

The number of reads at chr18:7833720 should be 2, with one A and one G. But the output of the command is different (see the data and output in the drive):

https://drive.google.com/drive/folders/14HwF65VllIIYSHR3y--k92140J4v4UWL

lindenb commented 3 years ago

Thank you for the bug report. I think I found my error. Can you please test my latest commit please ?

yliueagle commented 3 years ago

Thanks for your quick update and it works now. Yet I found another issue: the counts at positions encountering hard clipping seems incorrect (seems reads that have hard clippings are not included in the computation), see the following positions as an example

CHROM    POS

1: chr18 10736 2: chr18 10749 3: chr18 10755 4: chr18 11333 5: chr18 11335

lindenb commented 3 years ago

hard clipping seems incorrect

@yliueagle well, I didn't want to count the hard+soft clipping in the count of bases. I've just added a new option --clip to count them. For now I cannot download your BAM from google to test your positions above. Tell me if it fulfills your needs.

P

yliueagle commented 3 years ago

@lindenb Thanks again. The issue here is about the reads that have hard clipping, instead of clipped bases. See the test data (remove ".gz") below. At the position chr18 10736, findallcoverageatposition outputs DEPTH as 0

(btw, these are Hi-C chimeric reads)

test.bam.gz test.bam.bai.gz

test

lindenb commented 3 years ago

@yliueagle i saw your mail. Sorry, I don't have the time to explore this for now.