kaizhang / SnapATAC2

Single-cell epigenomics analysis tools
https://kzhang.org/SnapATAC2/
234 stars 26 forks source link

Calculation Method and Significance of "frac_confidently_mapped" ? #361

Open haloudashu opened 1 week ago

haloudashu commented 1 week ago

Hi, zhangkai: I am using SnapATAC2-v2.7.0 Due to experimental issues, I obtained a poor quality dataset where R1 consists entirely of fixed sequences, with R1 mapping accounting for only 6.74% of total R1 reads. R2 contains normal sequences, with R2 mapping accounting for 99.3% of total R2 reads. After performing paired-end alignment using bwa mem, I obtained a BAM file and used the following command:

bam_qc = snap.pp.make_fragment_file(
        bam_file='bwa_mem_pe_Sort.bam',
        output_file='cs_fragments.tsv.gz',
        barcode_regex = "^(.*?)_",
        compression = 'gzip',
        compression_level = 6)

This generated the following QC metrics:

{
    'frac_nonnuclear': 0.03449290449220156,
    'frac_unmapped': 0.9327618801730325,
    'frac_confidently_mapped': 0.4128082260099562,
    'sequenced_reads': 75512548.0,
    'frac_fragment_in_nucleosome_free_region': 0.1915164180071601,
    'sequenced_read_pairs': 37756142.0,
    'frac_q30_bases_read1': 0.9022374615036122,
    'frac_q30_bases_read2': 0.9467559964150998,
    'frac_duplicates': 0.1691827173347215,
    'frac_fragment_flanking_single_nucleosome': 0.5740424468373149,
    'frac_valid_barcode': 1.0
}

I noticed that the sum of "frac_confidently_mapped" and "frac_unmapped" exceeds 1, which is clearly unreasonable. Could you please explain how "frac_confidently_mapped" and "frac_unmapped" are calculated respectively?

Thank you!

kaizhang commented 5 days ago

The definitions of these two metrics are from 10x website:

Importantly, #total barcoded pairs != #all reads.