hsinnan75 / MapCaller

MapCaller – An efficient and versatile approach for short-read alignment and variant detection in high-throughput sequenced genomes
MIT License
29 stars 5 forks source link

Strand and read placement biases #25

Open carlosmag opened 4 years ago

carlosmag commented 4 years ago

Hi!

FreeBayes reports 4 INFO fields that are used to assess strand and read placement bias.

INFO=

INFO=

INFO=

INFO=

Graphical illustration:

ngs_biases_short adapted from freebayes in depth: model, filtering, and walkthrough, page 9, Erik Garrison 2015*

Can you include these metrics in MapCaller? Thank you!

hsinnan75 commented 4 years ago

Thanks for the suggestion. I agree these bias information is also important for downstream analysis. I'll design data structures to keep track of these information. The strand bias means an allele only occurs at specific strand, while cycle bias means an allele occurs at the beginning sequencing cycle. However, I don't understand what the placement bias means. Could you please explain it for me? Thank you very much. Moreover, Erik Garrison also mentioned allele imbalance. What does it mean? It seems like we can only observe an allele in a small portion of read alignments covering the locus. Is it correct?

carlosmag commented 4 years ago

Thanks for considering bias info! Read placement bias info is used to filter for sites with reads balanced to each side (left and right), as explained in freebayes in depth: model, filtering, and walkthrough, pages 16-17, Erik Garrison 2015. I am not familiar with allele imbalance, but it seems to be related with this:

from Galaxy Training!, Calling variants in non-diploid systems, Anton Nekrutenko 2018 it reflects significant deviation from the diploid (50/50) expectation (see https://galaxyproject.github.io/training-material/topics/variant-analysis/images/freebayes.pdf for more details)

tseemann commented 4 years ago

Moreover, Erik Garrison also mentioned allele imbalance. What does it mean? It seems like we can only observe an allele in a small portion of read alignments covering the locus. Is it correct?

@ekg - can you confirm the definition?

ekg commented 4 years ago

At a heterozygous genotype the fraction of observations for the pair of alleles is unlikely under an unbiased binomial sampling model.

On Tue, Oct 8, 2019, 08:51 Torsten Seemann notifications@github.com wrote:

Moreover, Erik Garrison also mentioned allele imbalance. What does it mean? It seems like we can only observe an allele in a small portion of read alignments covering the locus. Is it correct?

@ekg https://github.com/ekg - can you confirm the definition?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hsinnan75/MapCaller/issues/25?email_source=notifications&email_token=AABDQEMF2PHGS3D2HCAXU73QNQUYTA5CNFSM4I52KXU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEATBSZA#issuecomment-539367780, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQEPFAYRHG7QPLMDH5TTQNQUYTANCNFSM4I52KXUQ .