monarch-initiative / genophenocorr

Genotype Phenotype Correlation
https://monarch-initiative.github.io/genophenocorr/stable
MIT License
4 stars 1 forks source link

Tx viz - DO NOT MERGE YET #105

Closed pnrobinson closed 6 months ago

pnrobinson commented 7 months ago

Adding code to ingest VariantValidator JSON into Python datastructure and extract the most relevant transcript automagically. TODO - use this to draw graphic showing variants and exon strcuture.

ielis commented 7 months ago

I see that you've done a lot of work in _tx_model. However, the key part here is not to develop another data model, but to figure out how to get genophenocorr.model.TranscriptCoordinates for a transcript id (e.g. NM_12345.6). This is basically the job of genophenocorr.preprocessing.TranscriptCoordinateService. Therefore, we certainly do not need TxRetriever.

I suggest moving the code from _tx_model into genophenocrr.preprocessing._vv and using it to implement TranscriptCoordinateService that pings variant validator and parses the response into TranscriptCoordinates. This is what I want to see in the framework.

Another thing. I see that your VvTranscript has all sorts of properties, such as MANE, RefSeq select, priority, etc. I do not see, however, these properties on TranscriptCoordinates, because they are not related to coordinates. If you think having these properties is crucial, we need to add them to genophenocorr.model.Transcript and think about how to obtain them (from VEP? Do first VEP call then follow with Variant validator? ...).

pnrobinson commented 7 months ago

This is still prelim. But this is the idea

  1. At start of notebook, get a transcript model from VV. I am not sure if we need the MANE categories here, but at this point I am just retrieving the data from the JSON
  2. Use the exon structure to plot the variants
  3. Show this to the user so they decide what variants to test for now I would like to try 2 and 3 in isolation to see what works and then we could integrate? I am assuming there are no other classes now that do this?
ielis commented 7 months ago
  1. At start of notebook, get a transcript model from VV. I am not sure if we need the MANE categories here, but at this point I am just retrieving the data from the JSON

I don't think we know about the transcript at this point. I think we first need to do the functional annotation and show the transcript table and let the user choose the transcript ID.

  1. Use the exon structure to plot the variants

Yes, we can and should plot the variants after we did the functional annotation and the user knows which transcript he wants. Or he can choose a few transcripts and generate an SVG for each one.

  1. Show this to the user so they decide what variants to test for now I would like to try 2 and 3 in isolation to see what works and then we could integrate? I am assuming there are no other classes now that do this?

We don't yet have a class for getting the transcript coordinates. The variant validator should do that - your WIP.

This is a high level code that we should be able to write easily once we have lower-level functions:

pnrobinson commented 7 months ago

I think it is surely best not to allow the user to choose transcript from a list of options,but instead to demand that they make a good decision (usually MANE etc). Otherwise we will make it too easy for people not to think of this and to create bad decisions. We need to document this well, but I think the user needs to decide which transcript to use, not us!

ielis commented 7 months ago

I fully agree that we have to maximize the likelihood of the users choosing a meaningful transcript. However, this can only be done through tutorials, and documentation, and the code must handle bad decisions too.

So, I think the overall strategy which I outlined in my previous post should not be changed.

pnrobinson commented 7 months ago

I mean, there is almost never a reason not to use the MANE or the ClinVar preferred transcript. It will not help to show a list of transcripts and say which do you want to use, since if people do now know to use Clinvar, they are likely to choose at random and get a suboptimal transcript. We should see if we can automatically retrieve the MANE transcript and get its coordinates from VV. The subject of geno pheno is complicated and we should try to make an iPhone app rather than a swiss knife :-o

ielis commented 6 months ago

Closing since we moved the visualization code to another PR #114 which was merged. We will work on the visualization on another branch.