parklab / MosaicForecast

A mosaic detecting software based on phasing and random forest
MIT License
62 stars 21 forks source link

Need help for quality filtering of indels #32

Open jekim2022 opened 1 year ago

jekim2022 commented 1 year ago

In the article, I could find the Variant calling methods of SNVs and indels separately. I found you called SNVs using SAMtools mpileup with mapping quality >20 and base quality >20, but I couldn't find any mapping quality or base quality filter conditions in the calling method of indels.

So I am asking if the calling of indels doesn't need mapping quality or base quality filters, and why it doesn't need unlike SNVs.

Thank you in advance

douym commented 1 year ago

Hi @jekim2022 ,

Thanks for your question and sorry for the delayed response. While calling indels (in the "ReadLevel_Features_extraction.py") I also used baseQ-related features and mapQ-related features (for example, "mapq_p", "baseq_p","mapq_difference", "ref_baseq1b_p", etc.).

In the pre-scan stage, for SNVs baseQ<20 or mapQ<20 are typically used to filter low-quality reads, but for indels it's not so direct to calculate baseQs of alternative alleles (i.e., for deletions, you cannot read the "baseQ" of mutant alleles from the bam file directly). You can definitely pre-filter reads with low mapQ if you would want to.

Best wishes,

Y.

jekim2022 commented 1 year ago

Dear Y, I express my gratitude for your response, and your insights have indeed illuminated the path forward. Your assistance is deeply appreciated. I hope everything you do goes well.

Best, Jieun Kim

2023년 8월 25일 (금) 오후 4:37, douym @.***>님이 작성:

Hi @jekim2022 https://github.com/jekim2022 ,

Thanks for your question and sorry for the delayed response. While calling indels (in the "ReadLevel_Features_extraction.py") I also used baseQ-related features and mapQ-related features (for example, "mapq_p", "baseq_p","mapq_difference", "ref_baseq1b_p", etc.).

In the pre-scan stage, for SNVs baseQ<20 or mapQ<20 are typically used to filter low-quality reads, but for indels it's not so direct to calculate baseQs of alternative alleles (i.e., for deletions, you cannot read the "baseQ" of mutant alleles from the bam file directly). You can definitely pre-filter reads with low mapQ if you would want to.

Best wishes,

Y.

— Reply to this email directly, view it on GitHub https://github.com/parklab/MosaicForecast/issues/32#issuecomment-1692909161, or unsubscribe https://github.com/notifications/unsubscribe-auth/A43FDYPCIKGQBF2F4GAVCTDXXBI27ANCNFSM6AAAAAAZZTTCDA . You are receiving this because you were mentioned.Message ID: @.***>