igvteam / igv

Integrative Genomics Viewer. Fast, efficient, scalable visualization tool for genomics data and annotations
https://igv.org
MIT License
646 stars 387 forks source link

Discrepancy Between IGV and pysam Reads Statistics for Specific Genomic Position #1554

Closed chaopower-zhang closed 2 months ago

chaopower-zhang commented 2 months ago

Hello IGV team,

I am experiencing a discrepancy between the read statistics for a specific genomic position when using IGV compared to when using pysam. I would appreciate your assistance in understanding this issue.

Issue: When I inspect a specific genomic position (e.g., chr1:1000000) in IGV, the number of reads and their details appear to differ from those obtained using pysam. I have verified that the BAM files and regions analyzed are the same in both cases.

my python code :

def count_soft_clipped_ratio(bamfile, chromosome, start, end, ref, alt):
    count_list = []
    name = []
    count = 0
    for read in bamfile.fetch(chromosome, start - 1, end):
        if read.is_unmapped or read.is_secondary or read.is_supplementary or read.is_qcfail or read.mate_is_unmapped:
            continue
        if is_soft_clipped(read, start):
            continue
        if start == 142188927:
            # count_list.append(read.query_name)
            name.append(read.mapping_quality)
            if read.reference_start <= start < read.reference_end:
                count += 1
    print(count)

Questions:

Are there any known issues or settings in IGV that could lead to differences in read counts compared to pysam? Could there be specific filters or options in IGV affecting the display of reads that I might have overlooked? Is there a way to export or analyze read data in IGV to match the output from pysam more accurately? Thank you for your assistance. Any insights or suggestions to help resolve this discrepancy would be greatly appreciated.