Closed ielis closed 3 months ago
Are you all seeing this error on your end? I'm on branch fetch-mutations-from-gdc
commit 0c807e90c3fa9e1f1005e5fe758835df4025af7b
(venv) ~/PythonProject/oncoexporter fetch-mutations-from-gdc $ python3 scripts/run_bone.py
Creating cached dataframe as /Users/jtr4v/PythonProject/oncoexporter/.oncoexporter_cache/Bone_mutation_df.pkl
individual dataframe: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 743/743 [00:00<00:00, 9890.31it/s]
merged diagnosis dataframe: 0%| | 0/670 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/Users/jtr4v/PythonProject/oncoexporter/scripts/run_bone.py", line 11, in <module>
p = table_importer.get_ga4gh_phenopackets(Tsite, cohort_name=cohort_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jtr4v/PythonProject/oncoexporter/venv/lib/python3.11/site-packages/oncoexporter/cda/cda_table_importer.py", line 174, in get_ga4gh_phenopackets
disease_message = self._disease_factory.to_ga4gh(row)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jtr4v/PythonProject/oncoexporter/venv/lib/python3.11/site-packages/oncoexporter/cda/cda_disease_factory.py", line 105, in to_ga4gh
primary_site = self._uberon_mapper.get_ontology_term(row)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jtr4v/PythonProject/oncoexporter/venv/lib/python3.11/site-packages/oncoexporter/cda/mapper/op_uberon_mapper.py", line 56, in get_ontology_term
raise ValueError(f"Could not find UBERON term for primary_site=\"{primary_site}\"")
ValueError: Could not find UBERON term for primary_site="Bones, joints and articular cartilage of other and unspecified sites"
Were you able to resolve the above issue you were running into @justaddcoffee?
@sujaypatil96 yes, this problem went away. Possibly it was a problem with a stale cache file, b/c it went away when I deleted the old cache .pkl
files
@sujaypatil96 @ielis btw, can we merge this PR?
Awesome!! this PR is ready to be merged then 🚀
The PR creates
GdcMutationService
to retrieve variants of CDA subjects from GDC.See the test for example usage.
The logic is based on the gist written by @sujaypatil96 here.
There are still some TODOs left. The mapping of the VCF coordinates and functional annotations should be complete, however, we still may need to explore GDC to add the read depths, gene, and the mutation status.
When ready, we can use
GdcMutationService
e.g. withinCdaTableImporter
, to get the variants for the subjects.80