openvar / vv_hgvs

Python library to parse, format, validate, normalize, and map sequence variants. `pip install hgvs`
https://hgvs.readthedocs.io/
Apache License 2.0
6 stars 4 forks source link

Feature request: Search function to return all transcript mappings spanned by a query reference region #5

Open John-F-Wagstaff opened 2 years ago

John-F-Wagstaff commented 2 years ago

Feature request description, and associated problem vv_hgvs is currently the main interface for VVTA databases and is used by VariantValidator for this purpose. Users expect to be able to query VariantValidator for genomic variants, and receive as a response all affected transcripts. However, despite their expectations this is not the case, as a consistent hgvs nomenclature for handling variants beyond the bounds of the transcript has yet to be decided on by the HVNC, and as such the vv_hgvs so far lacks features for querying mapped transcripts in these cases. This has caused issues in variantValidator such as https://github.com/openvar/variantValidator/issues/399. As such it would be good to add a query that allows users to detect such transcripts to help fulfil these expectations.

Current proposed solution vv_hgvs already has a number of related functions, adding a similar one to handle this case should be reasonably straightforward. The underlying SQL should look something like SELECT * FROM current_valid_mapped_transcript_spans_mv WHERE alt_ac=$target_acc AND start_i >$query_start AND end_i < $query_end for total overlap or SELECT * FROM current_valid_mapped_transcript_spans_mv WHERE alt_ac=$target_acc AND end_i>$query_start AND start_i < $query_end. Relevant tests will also need to be added.

Alternatives It is possible that we could just expect users to query the VVTA directly, but this would complicate the usage of the VVTA by breaking through the expected layering.

Additional context We need to decide, and specify, whether the spans are exclusive or inclusive, document which, and test for this as well.

mashok-acog commented 1 year ago

Hi John, I am getting the following error when running the following vv_hgvs test script


hp = vvhgvs.parser.Parser()
hgvs_g = 'NC_000007.13:g.36561662C>T'
hgvs_c = 'NM_001637.3:c.1582G>A'
var_g = hp.parse_hgvs_variant(hgvs_g)
var_g
var_g.posedit.pos.start
str(var_g)
import vvhgvs.dataproviders.uta
hdp = vvhgvs.dataproviders.uta.connect()
import vvhgvs.assemblymapper
am = vvhgvs.assemblymapper.AssemblyMapper(hdp, assembly_name='GRCh37', alt_aln_method='splign', replace_reference=True)```

 ERROR:  relation "current_valid_mapped_transcript_spans_mv" does not exist at character 9
 select tx_ac,alt_ac,alt_strand,alt_aln_method,start_i,end_i
 from current_valid_mapped_transcript_spans_mv
 where alt_ac='NC_000007.13' and alt_aln_method='splign' and start_i < 36561662 and 36561662 <= end_i

the materialized view "current_valid_mapped_transcript_spans_mv" does not exist. 

Please help
John-F-Wagstaff commented 1 year ago

@mashok-acog Sorry for not replying to your other bug but, not only is it unclear what you mean in that bug, compared to this much clearer post, but I am also currently not full time on this project. If you need further help on this issue please move back to the original bug fill in the extra detail and '@' me. This bug is a feature request, it is not associated with your problem, please do not reply in this thread.

You probably just need to install the vvta (and it's own Seqrepo release) instead of the uta and it should work fine. The "current_valid_mapped_transcript_spans_mv" view is one of the first views used by the when searching for any relevant transcripts with an input chromosomal location, so this complaint is characteristic of missing/ outdated or misconfigured database.

As noted at the top of the readme however this project is mainly being used by the VariantValidator pipeline, and is not recommended for stand alone use. In some respects this represents a snapshot of an older hgvs version, though upgraded to work with the newer vvta database. This is required to work with the existing VariantValidator code base, which then tweaks the output to improve it. If you need a end user recommended project you should normally either use VariantValidator or mainline hgvs, as such the documentation has not been updated for this project as a stand alone system. The actual documentation to install this project is here (VariantValidator install docs), If you want to install this code stand alone you would need to do the "Setting up Seqrepo" and "Setting up VVTA database" sections from this as well as installing Seqrepo and the vvhgvs code, the configuration for vvhgvs should point at these data sources not the UTA versions of either. But again, this is not the recommended usage method, so please consider if the other options would be better for your use case.