Open pnrobinson opened 1 month ago
The transcript also leads to a crash
POLR1A_MANE_transcript = 'NM_015425.6' # Homo sapiens RNA polymerase I subunit A (POLR1A), mRNA
(...)
tx_coordinates = txc_service.fetch(POLR1A_MANE_transcript)
leads to
ValueError Traceback (most recent call last)
Cell In[8], [line 5](vscode-notebook-cell:?execution_count=8&line=5)
[3](vscode-notebook-cell:?execution_count=8&line=3) txc_service = VVMultiCoordinateService(genome_build=GRCh38)
[4](vscode-notebook-cell:?execution_count=8&line=4) pms = configure_default_protein_metadata_service()
----> [5](vscode-notebook-cell:?execution_count=8&line=5) tx_coordinates = txc_service.fetch(POLR1A_MANE_transcript)
[6](vscode-notebook-cell:?execution_count=8&line=6) #protein_meta = pms.annotate(POLR1A_protein_id)
File ~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:164, in VVMultiCoordinateService.fetch(self, tx)
[162](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:162) tx_id = self._parse_tx(tx)
[163](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:163) response_json = self.get_response(tx_id)
--> [164](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:164) return self.parse_response(tx_id, response_json)
File ~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:195, in VVMultiCoordinateService.parse_response(self, tx_id, response)
[193](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:193) raise ValueError(error_string)
[194](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:194) if 'transcripts' not in transcript_response:
--> [195](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:195) VVMultiCoordinateService._handle_missing_field(
[196](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:196) response=response,
[197](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:197) field='transcripts',
[198](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:198) )
[199](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:199) tx_data = self._find_tx_data(tx_id, transcript_response['transcripts'])
[200](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:200) if 'genomic_spans' not in tx_data:
File ~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:259, in VVMultiCoordinateService._handle_missing_field(response, field)
[257](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:257) json_formatted_str = json.dumps(response, indent=2)
...
ValueError: A required `transcripts` field is missing in the response from Variant Validator API:
{
"error": "Unable to recognise gene symbol LOC90784",
"requested_symbol": "NM_015425.6"
}
6_43519367_43519367_A_T
gets shown as None in gpsea for POLR1A
Variant Validotr (using GRCh38:6:43519367:A:T), shows
NM_203290.4:c.176A>T
BUT this is Homo sapiens RNA polymerase I and III subunit C (POLR1C), transcript variant 1, mRNA (not POLR1A)
NP_976035.1:p.(Asn59Ile)
There is some error, possibly in the upstream data, but GPSEA should probably emit a warning here? I will try to figure this out.