Closed pnrobinson closed 6 months ago
I see that you've done a lot of work in _tx_model
. However, the key part here is not to develop another data model, but to figure out how to get genophenocorr.model.TranscriptCoordinates
for a transcript id (e.g. NM_12345.6
). This is basically the job of genophenocorr.preprocessing.TranscriptCoordinateService
. Therefore, we certainly do not need TxRetriever
.
I suggest moving the code from _tx_model
into genophenocrr.preprocessing._vv
and using it to implement TranscriptCoordinateService
that pings variant validator and parses the response into TranscriptCoordinates
. This is what I want to see in the framework.
Another thing. I see that your VvTranscript
has all sorts of properties, such as MANE, RefSeq select, priority, etc. I do not see, however, these properties on TranscriptCoordinates
, because they are not related to coordinates. If you think having these properties is crucial, we need to add them to genophenocorr.model.Transcript
and think about how to obtain them (from VEP? Do first VEP call then follow with Variant validator? ...).
This is still prelim. But this is the idea
- At start of notebook, get a transcript model from VV. I am not sure if we need the MANE categories here, but at this point I am just retrieving the data from the JSON
I don't think we know about the transcript at this point. I think we first need to do the functional annotation and show the transcript table and let the user choose the transcript ID.
- Use the exon structure to plot the variants
Yes, we can and should plot the variants after we did the functional annotation and the user knows which transcript he wants. Or he can choose a few transcripts and generate an SVG for each one.
- Show this to the user so they decide what variants to test for now I would like to try 2 and 3 in isolation to see what works and then we could integrate? I am assuming there are no other classes now that do this?
We don't yet have a class for getting the transcript coordinates. The variant validator should do that - your WIP.
This is a high level code that we should be able to write easily once we have lower-level functions:
I think it is surely best not to allow the user to choose transcript from a list of options,but instead to demand that they make a good decision (usually MANE etc). Otherwise we will make it too easy for people not to think of this and to create bad decisions. We need to document this well, but I think the user needs to decide which transcript to use, not us!
I fully agree that we have to maximize the likelihood of the users choosing a meaningful transcript. However, this can only be done through tutorials, and documentation, and the code must handle bad decisions too.
So, I think the overall strategy which I outlined in my previous post should not be changed.
I mean, there is almost never a reason not to use the MANE or the ClinVar preferred transcript. It will not help to show a list of transcripts and say which do you want to use, since if people do now know to use Clinvar, they are likely to choose at random and get a suboptimal transcript. We should see if we can automatically retrieve the MANE transcript and get its coordinates from VV. The subject of geno pheno is complicated and we should try to make an iPhone app rather than a swiss knife :-o
Closing since we moved the visualization code to another PR #114 which was merged. We will work on the visualization on another branch.
Adding code to ingest VariantValidator JSON into Python datastructure and extract the most relevant transcript automagically. TODO - use this to draw graphic showing variants and exon strcuture.