labsquare / cutevariant

A standalone and free application to explore genetics variations from VCF file
https://cutevariant.labsquare.org/
GNU General Public License v3.0
102 stars 21 forks source link

How to manage phase genotype #183

Closed ysard closed 3 years ago

ysard commented 3 years ago

We expect only 4 values from gt field: -1, 0, 1, 2

According to the discussion/monologue in #182 , if I read correctly the doc of VCF, this field can be composite for multiple alleles with separators like | or / (I do not really understand the difference between phased and unphased genotypes defined by these characters BTW :( ).

I thought that it had to do with the heterozygous / homozygous definitions but a priori, it is not the case because we only take into account 1 number and only display an icon by defaut if several alleles are indicated?

Am I right ?

Can you valid the signification of the current icons (for future tooltips):

Secondarily, such icons are displayed in formatters and in variants_info plugin. And obviously the implementations are not the same...

plugin:
icon = self.GENOTYPES.get(genotype, self.GENOTYPES["-1"])

formatter:
icon = self.GENOTYPE_ICONS.get(int(value), self.GENOTYPE_ICONS[0])

=> 2 different default icons...

How to solve issues in this important feature?

dridk commented 3 years ago

During the VCF parsing, I split multi allele into many line. So:

become:

Human is diploid : Two chromosom of each type. So the following variant :

chr2424 A T

can be : A / A => ( genotype = 0 / wild homozygous ) A / T => ( genotype = 1 / heterozygous ) T /T => ( genotype = 2 / muted homozygous )

In some case, you don't have the genotype. So it should be : -1

Phasing mean you know if your variant are on the same same chromosome or not . For instance, you have 3 heterozygous variant :

chr2 A/T chr2 C/G chr2 G/C

If variants are not phased, you don't know which chromosom has the variant : It can be :

or

When variants are phased, you know :

chr2 T | T chr2 C | G chr2 G | C

T----C---G A----G----C

Actually... I have an idea how to display this info to the user. But it is for a next release.

Dr Sacha Schutz médecine / génétique moléculaire Bioinformatique dridk.me Fork me on github

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ Le vendredi 30 octobre 2020 05:28, ysard notifications@github.com a écrit :

We expect only 4 values from gt field: -1, 0, 1, 2

According to the discussion/monologue in #182 , if I read correctly the doc of VCF, this field can be composite for multiple alleles with separators like | or / (I do not really understand the difference between phased and unphased genotypes defined by these characters BTW :( ).

I thought that it had to do with the heterozygous / homozygous definitions but a priori, it is not the case because we only take into account 1 number and only display an icon by defaut if several alleles are indicated?

Am I right ?

Can you valid the signification of the current icons (for future tooltips):

  • -1: Unknown genotype
  • 0: Reference allele (in REF field)
  • 1: First allele listed in ALT
  • 2: Second allele listed in ALT

Secondarily, such icons are displayed in formatters and in variants_info plugin. And obviously the implementations are not the same...

plugin: icon = self.GENOTYPES.get(genotype, self.GENOTYPES["-1"])

formatter: icon = self.GENOTYPE_ICONS.get(int(value), self.GENOTYPE_ICONS[0])

=> 2 different default icons...

How to solve issues in this important feature?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ysard commented 3 years ago

Ok thank you for the precision, it's clear for all the points now. I had completely missed this part of VCFReader.

It is now documented and I fixed the default genotype icon; and added tooltips and description for this field.

I let it opened for this thing about phasing but feel free to close it.