openvar / variantValidator

Public repository for VariantValidator project
GNU Affero General Public License v3.0
67 stars 21 forks source link

gene2transcripts API: Genome Assembly Filter #515

Open Sophiaj93 opened 1 year ago

Sophiaj93 commented 1 year ago

Is your feature request related to a problem? Please describe. I am trying to use the API to retrieve exon genomic start/end coordinates (for a specific genome assembly) for MANE transcripts.

Describe the solution you'd like It would be useful to add the ability to filter on required genome assembly as part of the API request (and/or specify which genome assembly the coordinates correspond to in the API response JSON)

Describe alternatives you've considered Manually searching the transcript IDs in RefSeq to find the associated genome assembly.

Additional context With this API call I am hoping to generate a list of exon genomic start/end coordinates that I could then write to a BED file

ifokkema commented 1 year ago

We (LOVD) solve this by looking up the given NC refseq in a small dictionary that contains refseqs and genome builds. That is good enough for us, but since VV has the information on NC-to-genome-build, I imagine you'd want it included in the output. Until then, it's easy to work around it using a small dictionary.

leicray commented 1 year ago

Your requirement is to use "...the API to retrieve exon genomic start/end coordinates (for a specific genome assembly) for MANE transcripts." As far as I am aware, mappings for MANE transcripts are only comprehensively maintained for GRCh38. Some limited mapping data for old versions of MANE can be found for GRCh37: http://tark.ensembl.org/web/mane_GRCh37_list/.

If we were to implement retrieval of exon start/stop coordinates via the API, it would probably have to be only for GRCh38. Support for GRCh37 might prove to be problematic.

UPDATE: I have looked again at your original request and it looks like you would like our API to output exon genomic start/end coordinates to allow you to use the data for some other purpose. Unless output of these data provided enhanced functionality for normal validation of sequence variants, it is unlikely that we prioritise such a request.

ifokkema commented 1 year ago

UPDATE: I have looked again at your original request and it looks like you would like our API to output exon genomic start/end coordinates to allow you to use the data for some other purpose. Unless output of these data provided enhanced functionality for normal validation of sequence variants, it is unlikely that we prioritise such a request.

Maybe I misunderstand the request, but the gene2transcripts API endpoint already provides genomic start/stop locations of exons for input genes and transcripts. So, it's a feature that already exists? The only issue that I see compared to the request is that the output contains NC IDs instead of genome-build identifiers. That's fine by us, but I assume @Sophiaj93 meant she'd like to see those in the output, too.

Sophiaj93 commented 1 year ago

Thanks both. Yes, being able to see the genome-build identifiers in the output as well as the NC IDs would be useful and solve my problem. This is something i've discussed briefly with @Peter-J-Freeman as part of an MSc project at Manchester Uni.

Peter-J-Freeman commented 1 year ago

Sorry for the slow responses @Sophiaj93 . As you know I have been slammed with teaching material development.

I have developed an update to the API v2 version of genes to transcripts. The input can now be a list of genes "|" delimited. You can now also filter by transcript ID, or the key filters described in the Swagger docs. You can also now filter by genome build :)

Data are returned in list format.

Will be live and ready for testing by the end of the day

Peter-J-Freeman commented 1 year ago

@Sophiaj93 Code is now live, ready for testing. https://rest.variantvalidator.org/ gene2transcripts_v2

ifokkema commented 1 year ago

I believe this adds required fields to the API endpoint, right? If so, this breaks existing implementations. Luckily, I'm not using v2 yet of this function, but updates like this require an API with versioning. Have you checked the server logs for calls to gene2transcripts_v2? https://github.com/openvar/variantValidator/issues/128

Peter-J-Freeman commented 1 year ago

This endpoint is still in dev, so not yet fixed. Just haven't had time to maintain a dev server recently. But will check. Very much doubt its being used though. Good point, thanks

Still need to implement the API versioning. On the to do list.

ifokkema commented 1 year ago

Ah, I see. Is there any documentation or annotation on the Swagger UI on what endpoints are in dev and, therefore, can change at any given moment?

Peter-J-Freeman commented 1 year ago

That a good idea. No need in this case because I will fix by the end of the month and I think already fixed now, but good plan!

Sophiaj93 commented 9 months ago

Hi Pete,

Sorry for the huge delay in looking at this. Thanks again for adding these features!

I've just updated my code based on the new version of the gene2transcripts_v2 endpoint and all is working well. The genome build filter in particular is really helpful.

I haven't properly implemented the "|" delimited genes list in my code yet but have tried it out via the URL. Can't see any issues.

Thanks