shishenyxx / DeepMosaic

DeepMosaic is a deep-learning-based mosaic single nucleotide classification tool without the need of matched control information.
https://www.nature.com/articles/s41587-022-01559-w
Other
42 stars 5 forks source link

How DeepMosaic use population information in training #12

Closed hiyoothere closed 1 year ago

hiyoothere commented 1 year ago

Hi Dr. Yang,

I have a question about how DeepMosaic incorporate population AF in the training model. How does the population AF information at each position are incorporated in the model? Are the absolute positional information of the germline variants used in the training could affect the final output?

Thank you for your support

shishenyxx commented 1 year ago

Hi Dr. Yang,

I have a question about how DeepMosaic incorporate population AF in the training model. How does the population AF information at each position are incorporated in the model? Are the absolute positional information of the germline variants used in the training could affect the final output?

Thank you for your support

Hi hiyoothere,

Thank you for your interest in DeepMosaic!

  1. gnomAD population allele fraction was generated from annotating the gnomAD AF to the variant list and the annotated AF was incorporated into the final classifier in parallel with the neural network output.
  2. The genomic position is used to trace the population AF so if you use different reference genomes it would affect the output, otherwise only the allelic fraction at the variant position is used. Actually, during the training stage, we manually assigned different AF for true positive and true negative AF to training sets, regardless of their actual population AF on the specific genomic position. The detailed process is described in the method part of our manuscript.

Best,

Xiaoxu