Closed Crispy13 closed 10 months ago
Hi @Crispy13 ,
We use the "reference" and "reads" from the same sample to reduce this confusion. For example, when DeepConsensus was first trained, we used HG002 assembly (or reference) as the truth for the reads. There are haploid samples like CHM13 that reduce this issue further. So, usually when we use the assembly of the sample sample we don't expect all of the variation to be present in the assembly.
I hope this helps. You can read more about genome assembly here
Thank you for replying
Hi.
According to your video, the labels is from an assembly, i.e. reference sequence fasta right?
I'm a noob here so I'm not sure. But is it okay to use the reference bases as they are? There may be real variant bases in reads.
If a position has germline variant C (ref is A) with AF 100%, shouldn't the correct label for that position be C instead of reference base A?