imgag / ngs-bits

Short-read sequencing tools
MIT License
138 stars 30 forks source link

Improve link to LOVD #509

Closed ifokkema closed 6 months ago

ifokkema commented 6 months ago

The link to LOVD could, perhaps should, be improved: https://github.com/imgag/ngs-bits/blob/46c9f3753123a618ab72aee66ec311f4a7e3bad9/src/GSvar/VariantTable.cpp#L301

This causes a very inefficient action on the LOVD server, triggering two searches instead of one; we have APIs that can be used to determine if variants are found on a certain position. If so, a list of those positions can more efficiently be generated. Also, an API exists for a global LOVD search, allowing you to search in various LOVD instances at the same time.

marc-sturm commented 6 months ago

Hi Ivo,

I'm happy to change the URL if that's better for you.

I only found this API, which seems not to have support for searching a variant: https://api.lovd.nl/swagger/#/

For https://www.lovd.nl/3.0/search i found no documentation how to create the GET-style URL.

Can you point me to the documentation of the URL to use, optimally with examples for SNP and InDel?

Best, Marc

ifokkema commented 6 months ago

Hi Marc!

I'm happy to change the URL if that's better for you.

If you wish to keep it as a simple link, there are two options. For hg19/GRCh37 only (at the moment), by far the most efficient search is: https://databases.lovd.nl/shared/variants/chr15:40699840-40699840 This search uses all table indexes and has the lowest impact on our server's performance. It can be used for both single-nucleotide variants and multi-nucleotide variants.

For hg19 and hg38, you can use your existing strategy but with some modifications:

"https://databases.lovd.nl/shared/variants?search_chromosome=%3D%22" + variant.chr().strNormalized(false) + "%22&search_VariantOnGenome/DNA" + (GSvarHelper::build() == GenomeBuild::HG38 ? "/hg38" : "") + "=g." + QString::number(pos))); 

Note the replacement of "#" with "?" to force a direct query instead of an indirect one by the browser's Javascript, and the addition of "%3D%22" and "%22" surrounding the chromosome. This forces a search for, e.g., 2 (which also matches 12, 20, 21, and 22) to use ="2", which only matches 2. This way, the table's index is properly used. For insertions and multi-nucleotide deletions, duplications, etc, the position to search for can be, e.g., "g.1_2" instead of "g.1".

For APIs, we have the worldwide API, e.g., http://www.lovd.nl/search.php?build=hg19&position=chr15:40699840 or http://www.lovd.nl/search.php?build=hg19&position=chr13:32936732_32936735

For searching in our LOVD alone, the API is gene-specific so the above would be: https://databases.lovd.nl/shared/api/rest/variants/IVD?search_position=g.40699840&format=application/json with the gene symbol, or https://databases.lovd.nl/shared/api/rest/variants/6186?search_position=g.40699840&format=application/json with the gene's HGNC ID.

It also supports ranges and some more features. See the documentation for more info.

marc-sturm commented 6 months ago

Cheers, I updated the URL as you have suggested. It will take effect when we re-deploy GSvar, which is roughly every week.

ifokkema commented 6 months ago

Excellent, thanks!