monarch-initiative / oncoexporter

A package to convert cancer data to GA4GH phenopacket
https://monarch-initiative.github.io/oncoexporter
MIT License
7 stars 1 forks source link

Questions regarding variants #28

Open msierk opened 1 year ago

msierk commented 1 year ago

I have some questions about how we are going to do variants.

  1. We need to decide which fields from the CDA mutation endpoint we are keeping and which not. Right now cda_mutation_factory.py has 26 out of the 143 fields - is that all we're getting?

  2. I'm not sure how we want to capture things in CDA that aren't directly in the VariantDescriptor schema. For example the reference and tumor alleles, and the normal matched sample alleles. Should these go under Extension?

  3. I don't totally understand the to_ga4gh method in op_mutation.py. The Phenopacket schema has a VariantDescriptor and a VariantInterpretation. First, in the pyphetools hgvs_variant.py there is a note to change to an unambiguous name (to_ga4gh_variant_interpretation), which seems like should be done here. But more fundamentally it doesn't make sense to me to transform a variant into an interpretation - the interpretation is a field that is attached to the variant (which is how the CDA has it). It's not clear from the Phenopackets documentation how the VariantDescriptor and VariantInterpretation are linked together. It seems to me that the interpretation ought to be part of the descriptor, but even if there are two different data structures it is confusing to refer to transforming a Variant object into a VariantInterpretation.

  4. It is confusing that the VariationDescriptor has an "Expression" class which has nothing to do with gene expression, but that just refers to the syntax for "expressing" the variant. "Representation" would be less confusing. (I realize this is a Phenopackets issue, but I just wanted to point it out.)