lh3 / psmc

Implementation of the Pairwise Sequentially Markovian Coalescent (PSMC) model
Other
152 stars 60 forks source link

Genome masking #48

Open Larissa-Arantes opened 2 years ago

Larissa-Arantes commented 2 years ago

Dear, I have a question regarding the masking of repetitive regions of the genome. Should we use soft or hard masking of the genome? Is PSMC able to recognize lower cases as repetitive regions? How PSMC deal with this regions? Thank you very much. Best regards. LSA

Larissa-Arantes commented 2 years ago

Giving more context to my question, I'm running PSMC with the same set of SNPs (only for non-repetitive regions) but with different versions of the reference genome (soft-masked, hard-masked and no-mask). I'm generating my consensus sequence using the following command line:

"cat {RefGenome} | bcftools consensus -I {vcf} > Consensus.fasta"

where {vcf} is always the same set of SNPs (obtained for the masked genome) and {RefGenome} switches to soft-masked, hard-masked and unmasked versions of the same genome.

These are the PSMC plots obtained for the 3 different references:

Screenshot from 2022-09-28 13-25-23

So my question is: How does PSMC deal with the non-variable part of the genome when it is a normal sequence, soft-masked, or hard-masked? Thank you very much, LSA