lifeomic / phc-sdk-py

The phc-sdk-py is a developer kit for interfacing with the PHC API on Python 3.8 and above.
https://lifeomic.github.io/phc-sdk-py/index.html
MIT License
1 stars 2 forks source link

Add summary APIs #150

Closed rcdilorenzo closed 3 years ago

rcdilorenzo commented 3 years ago

Here are some examples (taken from sample BRCA data):

# Get all summaries in one pass
# NOTE: Currently, this is not paginated. Use `SummaryItemCounts` for the paginated, individual version.
phc.SummaryCounts.get_data_frame().info()

# Data columns (total 15 columns):
#  #   Column                   Non-Null Count  Dtype  
# ---  ------                   --------------  -----  
#  0   summary                  372 non-null    object 
#  1   code                     160 non-null    object 
#  2   display                  155 non-null    object 
#  3   patient_count            169 non-null    float64
#  4   system                   160 non-null    object 
#  5   count                    368 non-null    float64
#  6   media_type               4 non-null      object 
#  7   media_type_count         4 non-null      float64
#  8   clinvar_significance     100 non-null    object 
#  9   gene                     200 non-null    object 
#  10  population_pct           203 non-null    float64
#  11  population_sample_count  203 non-null    float64
#  12  sample_count             203 non-null    float64
#  13  status                   3 non-null      object 
#  14  test_type                3 non-null      object
phc.SummaryOmicsCounts.get_data_frame()

#                 summary clinvar_significance    gene  population_pct  population_sample_count  sample_count         status  patient_count    test_type
# 0  clinvar_significance           Pathogenic  PIK3CA        0.238384                    990.0         236.0            NaN            NaN          NaN
# 1  clinvar_significance           Pathogenic    TP53        0.118182                    990.0         117.0            NaN            NaN          NaN
# 2     copynumber_status                  NaN     NaN        1.000000                      2.0           2.0  amplification            NaN          NaN
# 3     copynumber_status                  NaN     NaN        0.500000                      2.0           1.0           loss            NaN          NaN
# 4          gene_variant                  NaN    TP53        0.346465                    990.0         343.0            NaN            NaN          NaN
# 5          gene_variant                  NaN  PIK3CA        0.332323                    990.0         329.0            NaN            NaN          NaN
# 6              sequence                  NaN     NaN             NaN                      NaN           NaN            NaN         1097.0          NaN
# 7              sequence                  NaN     NaN             NaN                      NaN           NaN            NaN            1.0          NaN
# 8                  test                  NaN     NaN             NaN                      NaN           NaN            NaN         1090.0  TCGA RNAseq
# 9                  test                  NaN     NaN             NaN                      NaN           NaN            NaN            1.0    GEM ExTra
phc.SummaryClinicalCounts.get_data_frame(match="fuzzy", system=["snomed.info", "loinc.org"])

#        summary       code                           display  patient_count                  system   count media_type  media_type_count
# 0    procedure  406505007       modified radical mastectomy          322.0  http://snomed.info/sct   322.0        NaN               NaN
# 1    procedure  392090004                             other          272.0  http://snomed.info/sct   272.0        NaN               NaN
# 2  observation    21975-8              Date of Last Contact         1094.0        http://loinc.org  1094.0        NaN               NaN
# 3   medication  387420009                           cytoxan          514.0  http://snomed.info/sct   523.0        NaN               NaN
# 4   medication  372817009       doxorubicin+cyclophosphamid          364.0  http://snomed.info/sct   371.0        NaN               NaN
# 5    condition  254837009                              None         1086.0  http://snomed.info/sct  1086.0        NaN               NaN
# 6    condition   82711006  Infiltrating duct carcinoma, NOS          778.0  http://snomed.info/sct   778.0        NaN               NaN
# NOTE: Can also just pass "condition" here like other options in the SDK
phc.SummaryItemCounts.get_data_frame(summary=phc.Option.SummaryClinicalType.CONDITION)

#         code  code_count                           display  patient_count                  system
# 0  254837009        1086                              None           1086  http://snomed.info/sct
# 1   82711006         778  Infiltrating duct carcinoma, NOS            778  http://snomed.info/sct
# 2   89740008         201            Lobular carcinoma, NOS            201  http://snomed.info/sct
# The summary version of `get_codes` is available on Observation, Condition, and Procedure.
# Otherwise, it uses the FHIR search service implementation.
phc.Observation.get_codes(query="receptor")

#       code  code_count                       display  patient_count            system
# 0  85337-4        1048      Estrogen Receptor Status           1048  http://loinc.org
# 1  85339-0        1047  Progesterone Receptor Status           1047  http://loinc.org
# 2  49683-6         919      HER2/neu receptor status            919  http://loinc.org