monarch-initiative / gpsea

A Python library for discovery of genotype-phenotype associations
https://monarch-initiative.github.io/gpsea/stable
MIT License
4 stars 1 forks source link

6_43519367_43519367_A_T not correctly parsed #275

Open pnrobinson opened 3 days ago

pnrobinson commented 3 days ago

6_43519367_43519367_A_T

gets shown as None in gpsea for POLR1A

Variant Validotr (using GRCh38:6:43519367:A:T), shows

NM_203290.4:c.176A>T

BUT this is Homo sapiens RNA polymerase I and III subunit C (POLR1C), transcript variant 1, mRNA (not POLR1A)

NP_976035.1:p.(Asn59Ile)

There is some error, possibly in the upstream data, but GPSEA should probably emit a warning here? I will try to figure this out.

pnrobinson commented 3 days ago

The transcript also leads to a crash

POLR1A_MANE_transcript = 'NM_015425.6' # Homo sapiens RNA polymerase I subunit A (POLR1A), mRNA
(...)
tx_coordinates = txc_service.fetch(POLR1A_MANE_transcript)

leads to

ValueError                                Traceback (most recent call last)
Cell In[8], [line 5](vscode-notebook-cell:?execution_count=8&line=5)
      [3](vscode-notebook-cell:?execution_count=8&line=3) txc_service = VVMultiCoordinateService(genome_build=GRCh38)
      [4](vscode-notebook-cell:?execution_count=8&line=4) pms = configure_default_protein_metadata_service()
----> [5](vscode-notebook-cell:?execution_count=8&line=5) tx_coordinates = txc_service.fetch(POLR1A_MANE_transcript)
      [6](vscode-notebook-cell:?execution_count=8&line=6) #protein_meta = pms.annotate(POLR1A_protein_id)

File ~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:164, in VVMultiCoordinateService.fetch(self, tx)
    [162](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:162) tx_id = self._parse_tx(tx)
    [163](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:163) response_json = self.get_response(tx_id)
--> [164](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:164) return self.parse_response(tx_id, response_json)

File ~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:195, in VVMultiCoordinateService.parse_response(self, tx_id, response)
    [193](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:193)     raise ValueError(error_string)
    [194](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:194) if 'transcripts' not in transcript_response:
--> [195](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:195)     VVMultiCoordinateService._handle_missing_field(
    [196](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:196)         response=response, 
    [197](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:197)         field='transcripts',
    [198](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:198)     )
    [199](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:199) tx_data = self._find_tx_data(tx_id, transcript_response['transcripts'])
    [200](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:200) if 'genomic_spans' not in tx_data:

File ~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:259, in VVMultiCoordinateService._handle_missing_field(response, field)
    [257](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:257) json_formatted_str = json.dumps(response, indent=2)
...
ValueError: A required `transcripts` field is missing in the response from Variant Validator API: 
{
  "error": "Unable to recognise gene symbol LOC90784",
  "requested_symbol": "NM_015425.6"
}