monarch-initiative / pyphetools

Python Phenopacket Tools
https://monarch-initiative.github.io/pyphetools/
MIT License
9 stars 1 forks source link

Allinone #90

Closed pnrobinson closed 3 months ago

pnrobinson commented 3 months ago

@ielis Hi Daniel -- can we touch bases on Mon or Tue about this -- I added two functions to streamline processing the template. Everything seems to work but there is one error. One question -- I think you remove the hpo ontology getter from the HpoCr class, which is reasonable, but maybe we can add back a hpo-version function? This would simplify things and the HPO version is arguably an important attribute of the CR.

Here is how the new functions would be used

from pyphetools.creation import import_phenopackets_from_template, create_hpoa_from_phenopackets
template = "../phenopacket-store/notebooks/SPTAN1/input/SPTAN1_DEVEP_individuals.xlsx"
hpjson = "../phenopacket-store/notebooks/hp.json"

# individuals, cvalidator = import_phenopackets_from_template(template, hpjson)
# The above line will initially lead to a warning that we need to add four deletions manually
# The following lines do this -- note the additional argument
deletions = {"arr{hg38}9q34.11(128,609,213-128,613,675)x1",
                "arr{hg38}9q34.11(128591376-128600369)x1",
                "arr{hg38}9q34.11 (128,582,754-128,587,726)x1",
                "arr{hg38}9q34.11 (128,587,422-128,600,316)x1"}
individuals, cvalidator = import_phenopackets_from_template(template, hpjson, deletions=deletions)

### Here, we begin QC and display
from pyphetools.visualization import IndividualTable, QcVisualizer
from IPython.display import display, HTML
qc = QcVisualizer(cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))
# new cell
individuals = cvalidator.get_error_free_individual_list()
## For some reason, the following does not work
#table = IndividualTable(individuals)
#display(HTML(table.to_html()))
# new cell
df = create_hpoa_from_phenopackets(moi="Autosomal dominant", pmid="PMID:29050398")
## all done or check the dataframe

I hope this is getting to a place where a borader audience can use this tool. Also, for phenopacket store, maybe it is enough to just use a Python script for some of the genes -- I want to use this for all new annotations to the HPO and want to make things as efficient as possible!

ielis commented 3 months ago

Hi @pnrobinson I think the workflow is OK from the high-level.

The thing regarding removal of HPO from the concept recognizer - that is an example of what I consider as a code smell. It makes CR doing two things: 1) recognize concepts 2) provide HPO. I think CR should only do the 1st thing and we must update the code that used to rely on the 2nd functionality and simply ask for the HPO via the constructor.

I think using the HPO version in CR is OK as long as the CR does not expose it as a property. This should be OK because we should be using just one HPO version during the phenopacket creation anyway.

Last, regarding the HPOA, I haven't got there yet, so I honestly do not know what the best course of action should be. I'll get there eventually and we will work it out. I'm happy to discuss this early this week.

pnrobinson commented 3 months ago

Merging this despite create_hpoa_from_phenopacketsIndividual it is some weird manging bug but we should remove the import * statements now that the API is stable