monarch-initiative / oncoexporter

Cancer data to GA4GH phenopacket
https://monarch-initiative.github.io/oncoexporter
MIT License
6 stars 1 forks source link

Pull mutation data from GDC #81

Closed ielis closed 3 months ago

ielis commented 4 months ago

The PR creates GdcMutationService to retrieve variants of CDA subjects from GDC.

See the test for example usage.

The logic is based on the gist written by @sujaypatil96 here.

There are still some TODOs left. The mapping of the VCF coordinates and functional annotations should be complete, however, we still may need to explore GDC to add the read depths, gene, and the mutation status.

When ready, we can use GdcMutationService e.g. within CdaTableImporter, to get the variants for the subjects.

80

justaddcoffee commented 4 months ago

Are you all seeing this error on your end? I'm on branch fetch-mutations-from-gdc commit 0c807e90c3fa9e1f1005e5fe758835df4025af7b

(venv) ~/PythonProject/oncoexporter fetch-mutations-from-gdc $ python3 scripts/run_bone.py 

Creating cached dataframe as /Users/jtr4v/PythonProject/oncoexporter/.oncoexporter_cache/Bone_mutation_df.pkl
individual dataframe: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 743/743 [00:00<00:00, 9890.31it/s]
merged diagnosis dataframe:   0%|                                                                                                  | 0/670 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Users/jtr4v/PythonProject/oncoexporter/scripts/run_bone.py", line 11, in <module>
    p = table_importer.get_ga4gh_phenopackets(Tsite, cohort_name=cohort_name)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jtr4v/PythonProject/oncoexporter/venv/lib/python3.11/site-packages/oncoexporter/cda/cda_table_importer.py", line 174, in get_ga4gh_phenopackets
    disease_message = self._disease_factory.to_ga4gh(row)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jtr4v/PythonProject/oncoexporter/venv/lib/python3.11/site-packages/oncoexporter/cda/cda_disease_factory.py", line 105, in to_ga4gh
    primary_site = self._uberon_mapper.get_ontology_term(row)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jtr4v/PythonProject/oncoexporter/venv/lib/python3.11/site-packages/oncoexporter/cda/mapper/op_uberon_mapper.py", line 56, in get_ontology_term
    raise ValueError(f"Could not find UBERON term for primary_site=\"{primary_site}\"")
ValueError: Could not find UBERON term for primary_site="Bones, joints and articular cartilage of other and unspecified sites"
sujaypatil96 commented 3 months ago

Were you able to resolve the above issue you were running into @justaddcoffee?

justaddcoffee commented 3 months ago

@sujaypatil96 yes, this problem went away. Possibly it was a problem with a stale cache file, b/c it went away when I deleted the old cache .pkl files

justaddcoffee commented 3 months ago

@sujaypatil96 @ielis btw, can we merge this PR?

sujaypatil96 commented 3 months ago

Awesome!! this PR is ready to be merged then 🚀