ohsu-comp-bio / g2p-aggregator

Associations of genomic features, drugs and diseases
48 stars 11 forks source link

SOID parents misleading #103

Closed ahwagner closed 6 years ago

ahwagner commented 6 years ago

SOID terms contain sequence ontology root concept as "parent". Is this the intended behavior?

It looks like this was implemented by @mayfielg here.

Example association below ('soid': 'SO:0001019', 'parent_soid': 'SO:0000110'. Expected 'parent_soid': 'SO:0000248')

{'association': {'description': 'PIK3CA H1047R,ER/PR positive and ERBB2 Loss confers sensitivity to Exemestane, Everolimus and Everolimus + Exemestane in patients with Neoplasm of breast',
  'drug_labels': 'EXEMESTANE,EVEROLIMUS,EVEROLIMUS,EXEMESTANE',
  'environmentalContexts': [{'approved_countries': ['Canada', 'US'],
    'description': 'EXEMESTANE',
    'id': 'CID60198',
    'source': 'http://rdf.ncbi.nlm.nih.gov/pubchem/compound',
    'taxonomy': {'class': 'Steroids and steroid derivatives',
     'direct-parent': 'Androgens and derivatives',
     'kingdom': 'Organic compounds',
     'subclass': 'Androstane steroids',
     'superclass': 'Lipids and lipid-like molecules'},
    'term': 'EXEMESTANE',
    'toxicity': 'Convulsions',
    'usan_stem': 'antineoplastics (aromatase inhibitors)'},
   {'approved_countries': ['Canada', 'US'],
    'description': 'EVEROLIMUS',
    'id': 'CID6442177',
    'source': 'http://rdf.ncbi.nlm.nih.gov/pubchem/compound',
    'taxonomy': {'class': 'Macrolide lactams',
     'direct-parent': 'Macrolide lactams',
     'kingdom': 'Organic compounds',
     'superclass': 'Phenylpropanoids and polyketides'},
    'term': 'EVEROLIMUS',
    'toxicity': 'IC50 of 0.63 nM.',
    'usan_stem': 'immunosuppressives: immunosuppressant, rapamycin derivatives'},
   {'approved_countries': ['Canada', 'US'],
    'description': 'EVEROLIMUS',
    'id': 'CID6442177',
    'source': 'http://rdf.ncbi.nlm.nih.gov/pubchem/compound',
    'taxonomy': {'class': 'Macrolide lactams',
     'direct-parent': 'Macrolide lactams',
     'kingdom': 'Organic compounds',
     'superclass': 'Phenylpropanoids and polyketides'},
    'term': 'EVEROLIMUS',
    'toxicity': 'IC50 of 0.63 nM.',
    'usan_stem': 'immunosuppressives: immunosuppressant, rapamycin derivatives'},
   {'approved_countries': ['Canada', 'US'],
    'description': 'EXEMESTANE',
    'id': 'CID60198',
    'source': 'http://rdf.ncbi.nlm.nih.gov/pubchem/compound',
    'taxonomy': {'class': 'Steroids and steroid derivatives',
     'direct-parent': 'Androgens and derivatives',
     'kingdom': 'Organic compounds',
     'subclass': 'Androstane steroids',
     'superclass': 'Lipids and lipid-like molecules'},
    'term': 'EXEMESTANE',
    'toxicity': 'Convulsions',
    'usan_stem': 'antineoplastics (aromatase inhibitors)'}],
  'evidence': [{'description': 'PIK3CA H1047R,ER/PR positive and ERBB2 Loss confers sensitivity to Exemestane, Everolimus and Everolimus + Exemestane in patients with Neoplasm of breast',
    'evidenceType': {'sourceName': 'molecularmatch'},
    'info': {'publications': ['https://www.ncbi.nlm.nih.gov/pubmed/28183140']}}],
  'evidence_label': 'C',
  'evidence_level': 3,
  'phenotype': {'description': 'breast cancer',
   'family': 'thoracic cancer',
   'type': {'id': 'DOID:1612',
    'source': 'http://purl.obolibrary.org/obo/doid',
    'term': 'breast cancer'}},
  'publication_url': 'https://www.ncbi.nlm.nih.gov/pubmed/28183140',
  'response_type': '2C',
  'variant_name': ['H1047R']},
 'dev_tags': ['no-so'],
 'feature_names': 'ERBB2 Loss',
 'features': [{'alt': 'G',
   'biomarker_type': 'Synonymous,Missense',
   'chromosome': '3',
   'description': 'PIK3CA ERBB2 Loss',
   'end': 178952085,
   'geneSymbol': 'PIK3CA',
   'links': ['http://myvariant.info/v1/variant/chr3:g.178952085A>G?assembly=hg19',
    'http://myvariant.info/v1/variant/chr3:g.179234297A>G?assembly=hg38',
    'http://reg.genome.network/refseq/RS000027',
    'http://reg.genome.network/allele/CA123326',
    'http://gnomad.broadinstitute.org/variant/3-178952085-A-G',
    'http://www.ncbi.nlm.nih.gov/snp/121913279',
    'http://www.ncbi.nlm.nih.gov/clinvar/?term=28691[alleleid]',
    'http://exac.broadinstitute.org/variant/3-178952085-A-G',
    'http://reg.genome.network/refseq/RS000051',
    'http://reg.genome.network/refseq/RS002196',
    'http://reg.genome.network/refseq/RS000003',
    'http://www.ncbi.nlm.nih.gov/clinvar/variation/13652',
    'http://cancer.sanger.ac.uk/cosmic/mutation/overview?id=775'],
   'name': 'ERBB2 Loss',
   'provenance': ['http://reg.genome.network/allele?hgvs=NC_000003.11%3Ag.178952085A%3EG'],
   'provenance_rule': 'from_source',
   'ref': 'A',
   'referenceName': 'GRCh37',
   'sequence_ontology': {'hierarchy': ['SO:0000400',
     'SO:0001761',
     'SO:0001814'],
    'name': 'synonymous',
    'parent_name': 'sequence_attribute',
    'parent_soid': 'SO:0000400',
    'soid': 'SO:0001815'},
   'start': 178952085,
   'synonyms': ['chr3:g.179234297A>G',
    'NC_000003.12:g.179234297A>G',
    'CM000665.1:g.178952085A>G',
    '3-178952085-A-G',
    'NC_000003.11:g.178952085A>G',
    'LRG_310:g.90775A>G',
    'COSM775',
    'NC_000003.10:g.180434779A>G',
    'chr3:g.178952085A>G',
    'NG_012113.2:g.90775A>G',
    'CM000665.2:g.179234297A>G']},
  {'alt': 'G',
   'biomarker_type': 'Parent Mutation',
   'chromosome': '17',
   'description': 'PIK3CA ERBB2 Loss',
   'end': 37883691,
   'geneSymbol': 'PIK3CA',
   'links': ['http://myvariant.info/v1/variant/chr17:g.37883691C>G?assembly=hg19',
    'http://reg.genome.network/refseq/RS000041',
    'http://reg.genome.network/allele/CA499671674',
    'http://reg.genome.network/refseq/RS000017',
    'http://reg.genome.network/refseq/RS000622',
    'http://reg.genome.network/refseq/RS000065'],
   'name': 'ERBB2 Loss',
   'provenance': ['http://myvariant.info/v1/query?q=PIK3CA ERBB2 Loss',
    'http://reg.genome.network/allele?hgvs=NC_000017.10%3Ag.37883691C%3EG'],
   'provenance_rule': 'default_feature',
   'ref': 'C',
   'referenceName': 'GRCh37',
   'sequence_ontology': {'name': 'Uncategorized',
    'parent_name': 'Uncategorized',
    'parent_soid': '',
    'soid': ''},
   'start': 37883691,
   'synonyms': ['CM000679.2:g.39727438C>G',
    'CM000679.1:g.37883691C>G',
    'NC_000017.10:g.37883691C>G',
    'NG_007503.1:g.44299C>G',
    'NC_000017.9:g.35137217C>G',
    'chr17:g.37883691C>G',
    'NC_000017.11:g.39727438C>G',
    'LRG_724:g.44299C>G']},
  {'alt': 'G',
   'biomarker_type': 'Copy Number Variant',
   'chromosome': '17',
   'description': 'PIK3CA ERBB2 Loss',
   'end': 37883691,
   'geneSymbol': 'PIK3CA',
   'links': ['http://myvariant.info/v1/variant/chr17:g.37883691C>G?assembly=hg19',
    'http://reg.genome.network/refseq/RS000041',
    'http://reg.genome.network/allele/CA499671674',
    'http://reg.genome.network/refseq/RS000017',
    'http://reg.genome.network/refseq/RS000622',
    'http://reg.genome.network/refseq/RS000065'],
   'name': 'ERBB2 Loss',
   'provenance': ['http://myvariant.info/v1/query?q=PIK3CA ERBB2 Loss',
    'http://reg.genome.network/allele?hgvs=NC_000017.10%3Ag.37883691C%3EG'],
   'provenance_rule': 'default_feature',
   'ref': 'C',
   'referenceName': 'GRCh37',
   'sequence_ontology': {'hierarchy': ['SO:0000110',
     'SO:0002072',
     'SO:0001059',
     'SO:0000248'],
    'name': 'copy_number_variation',
    'parent_name': 'sequence_feature',
    'parent_soid': 'SO:0000110',
    'soid': 'SO:0001019'},
   'start': 37883691,
   'synonyms': ['CM000679.2:g.39727438C>G',
    'CM000679.1:g.37883691C>G',
    'NC_000017.10:g.37883691C>G',
    'NG_007503.1:g.44299C>G',
    'NC_000017.9:g.35137217C>G',
    'chr17:g.37883691C>G',
    'NC_000017.11:g.39727438C>G',
    'LRG_724:g.44299C>G']}],
 'genes': ['PIK3CA', 'ESR1', 'ERBB2'],
 'raw': {'_score': 3,
  'ampcap': '2C',
  'ast': {'left': {'left': {'left': {'raw': '"PIK3CA H1047R"',
      'type': 'Literal',
      'value': 'PIK3CA H1047R'},
     'operator': '&&',
     'right': {'raw': '"Neoplasm of breast"',
      'type': 'Literal',
      'value': 'Neoplasm of breast'},
     'type': 'LogicalExpression'},
    'operator': '&&',
    'right': {'raw': '"ER/PR positive"',
     'type': 'Literal',
     'value': 'ER/PR positive'},
    'type': 'LogicalExpression'},
   'operator': '&&',
   'right': {'raw': '"ERBB2 Loss"', 'type': 'Literal', 'value': 'ERBB2 Loss'},
   'type': 'LogicalExpression'},
  'autoGenerateNarrative': True,
  'biomarkerClass': 'predictive',
  'civic': 'B',
  'classifications': [{'Alt': [],
    'COSMIC_ID': [],
    'Chr': [],
    'End': [],
    'Exon': [],
    'ExonicFunc': [],
    'NucleotideChange': [],
    'PopFreqMax': [],
    'Ref': [],
    'Start': [],
    'alias': 'PIK3CA H1047R',
    'classification': 'actionable',
    'classificationOverride': None,
    'copyNumberType': None,
    'dbSNP': [],
    'description': 'PIK3CA H1047R is one of the most recurrent mutations in cancer, especially breast cancer. Of PIK3CA mutant breast cancers, over half harbor this mutation. Meta-analyses have shown that patients harboring this mutation may have worse overall survival, but other studies have shown no difference between H1047R and other PIK3CA mutants from a prognostic standpoint. While very prevalent, targeted therapies for this particular mutation are still in early clinical trial phases. Source: CIViC',
    'drugsApprovedOffLabelCount': 1,
    'drugsApprovedOnLabelCount': 0,
    'drugsExperimentalCount': 25,
    'geneSymbol': 'PIK3CA',
    'name': 'PIK3CA H1047R',
    'parents': [],
    'pathology': ['Pathogenic/Likely pathogenic'],
    'priority': 2,
    'publicationCount': 56,
    'rootTerm': 'PIK3CA H1047R',
    'sources': ['COSMIC', 'CIViC', 'ClinVar', 'DoCM', 'cBioPortal'],
    'transcript': None,
    'transcripts': [],
    'trialCount': 87},
   {'Alt': [],
    'COSMIC_ID': [],
    'Chr': [],
    'End': [],
    'Exon': [],
    'ExonicFunc': [],
    'NucleotideChange': [],
    'PopFreqMax': [],
    'Ref': [],
    'Start': [],
    'alias': 'ER/PR positive',
    'classification': 'actionable',
    'classificationOverride': None,
    'copyNumberType': None,
    'dbSNP': [],
    'description': '',
    'drugsApprovedOffLabelCount': 4,
    'drugsApprovedOnLabelCount': 4,
    'drugsExperimentalCount': 10,
    'geneSymbol': 'ESR1',
    'name': 'ER/PR positive',
    'parents': [],
    'pathology': [],
    'priority': 1,
    'publicationCount': 4816,
    'rootTerm': 'ER/PR positive',
    'sources': [],
    'transcript': None,
    'transcripts': [],
    'trialCount': 352},
   {'Alt': [],
    'COSMIC_ID': [],
    'Chr': [],
    'End': [],
    'Exon': [],
    'ExonicFunc': [],
    'NucleotideChange': [],
    'PopFreqMax': [],
    'Ref': [],
    'Start': [],
    'alias': 'ERBB2 Loss',
    'classification': 'actionable',
    'classificationOverride': None,
    'copyNumberType': None,
    'dbSNP': [],
    'description': '',
    'drugsApprovedOffLabelCount': 0,
    'drugsApprovedOnLabelCount': 3,
    'drugsExperimentalCount': 0,
    'geneSymbol': 'ERBB2',
    'name': 'ERBB2 Loss',
    'parents': [],
    'pathology': [],
    'priority': 1,
    'publicationCount': 984,
    'rootTerm': 'ERBB2 Loss',
    'sources': ['MolecularMatch'],
    'transcript': None,
    'transcripts': [],
    'trialCount': 488}],
  'clinicalSignificance': 'sensitive',
  'criteriaMet': [],
  'criteriaUnmet': [{'compositeKey': 'PIK3CA H1047RMUTATIONinclude',
    'facet': 'MUTATION',
    'filterType': 'include',
    'generatedBy': '',
    'generatedByTerm': '',
    'isNew': True,
    'manualSuppress': 0,
    'primary': False,
    'priority': 1,
    'suppress': False,
    'term': 'PIK3CA H1047R',
    'transcript': '',
    'valid': True},
   {'compositeKey': 'Neoplasm of breastCONDITIONinclude',
    'facet': 'CONDITION',
    'filterType': 'include',
    'generatedBy': '',
    'generatedByTerm': '',
    'isNew': True,
    'manualSuppress': 0,
    'primary': True,
    'priority': 1,
    'suppress': False,
    'term': 'Neoplasm of breast',
    'transcript': '',
    'valid': True},
   {'compositeKey': 'ER/PR positiveMUTATIONinclude',
    'facet': 'MUTATION',
    'filterType': 'include',
    'generatedBy': '',
    'generatedByTerm': '',
    'isNew': True,
    'manualSuppress': 0,
    'primary': False,
    'priority': 1,
    'suppress': False,
    'term': 'ER/PR positive',
    'transcript': '',
    'valid': True},
   {'compositeKey': 'ERBB2 LossMUTATIONinclude',
    'facet': 'MUTATION',
    'filterType': 'include',
    'generatedBy': '',
    'generatedByTerm': '',
    'isNew': True,
    'manualSuppress': 0,
    'primary': False,
    'priority': 1,
    'suppress': False,
    'term': 'ERBB2 Loss',
    'transcript': '',
    'valid': True}],
  'customer': 'MolecularMatch',
  'direction': 'does_not_support',
  'expression': '"PIK3CA H1047R" && "Neoplasm of breast" && "ER/PR positive" && "ERBB2 Loss"',
  'external_id': [],
  'guidelineBody': '',
  'guidelineVersion': '',
  'hashKey': '33e0daf3ab99e261a91f0ba3375892f0',
  'id': '9f692fc4-a770-4c89-894f-b31e510ac7eb',
  'includeCondition0': ['Neoplasia',
   'Malignant neoplastic disease',
   'Solid tumor'],
  'includeCondition1': ['Neoplasm of breast'],
  'includeDrug1': ['Exemestane', 'Everolimus', 'Everolimus + Exemestane'],
  'includeFinding1': ['Menopause'],
  'includeGene0': ['ESR1', 'PGR', 'PIK3CA'],
  'includeMutation0': ['PIK3CA exon 21 mutation'],
  'includeMutation1': ['PIK3CA H1047R', 'ER/PR positive', 'ERBB2 Loss'],
  'institution': [],
  'mutations': [{'GRCh37_location': [{'alt': 'G',
      'chr': '3',
      'compositeKey': '483a75e977e8caae09fcf9ee29988d0b',
      'ref': 'A',
      'start': 178952085,
      'stop': 178952085,
      'strand': '+',
      'transcript_consequences': [{'amino_acid_change': 'p.H1047R',
        'cdna': 'c.3140A>G',
        'exonNumber': '21',
        'intronNumber': '',
        'transcript': 'NM_006218.3',
        'txSites': []},
       {'amino_acid_change': 'p.H1047R',
        'cdna': 'c.3140A>G',
        'exonNumber': '21',
        'intronNumber': '',
        'transcript': 'ENST00000263967',
        'txSites': []},
       {'amino_acid_change': 'p.H1047R',
        'cdna': 'c.3140A>G',
        'exonNumber': '20',
        'intronNumber': '',
        'transcript': 'CCDS43171',
        'txSites': []}],
      'validated': 'wgsa'}],
    '_src': 1,
    'cdna': ['c.3140A>G'],
    'description': 'PIK3CA H1047R is one of the most recurrent mutations in cancer, especially breast cancer. Of PIK3CA mutant breast cancers, over half harbor this mutation. Meta-analyses have shown that patients harboring this mutation may have worse overall survival, but other studies have shown no difference between H1047R and other PIK3CA mutants from a prognostic standpoint. While very prevalent, targeted therapies for this particular mutation are still in early clinical trial phases. Source: CIViC',
    'geneSymbol': 'PIK3CA',
    'id': '300f2922da251886ba373fb67939e874',
    'longestTranscript': 'NM_006218.2',
    'mutation_type': ['Missense', 'Synonymous'],
    'name': 'PIK3CA H1047R',
    'parents': [],
    'pathology': ['Pathogenic/Likely pathogenic'],
    'sources': ['COSMIC', 'CIViC', 'ClinVar', 'DoCM', 'cBioPortal'],
    'synonyms': [],
    'transcript': 'NM_006218.2',
    'transcriptConsequence': [{'alt': 'G',
      'amino_acid_change': 'H1047R',
      'cdna': 'c.3140A>G',
      'chr': '3',
      'compositeKey': '782a74b381d5eb3c9ef858659e373b2c',
      'custom': False,
      'exonNumber': '21',
      'intronNumber': None,
      'ref': 'A',
      'referenceGenome': 'grch37_hg19',
      'start': 178952085,
      'stop': 178952085,
      'strand': '+',
      'suppress': False,
      'transcript': 'NM_006218.3',
      'validated': 'wgsa'},
     {'alt': 'G',
      'amino_acid_change': 'p.H1047R',
      'cdna': 'c.3140A>G',
      'chr': '3',
      'compositeKey': 'e509d2182dc0ecd8c0cbbcf7f6cd926a',
      'custom': False,
      'exonNumber': '21',
      'intronNumber': '',
      'ref': 'A',
      'referenceGenome': 'grch37_hg19',
      'start': '178952085',
      'stop': '178952085',
      'strand': '+',
      'suppress': False,
      'transcript': 'ENST00000263967',
      'validated': 'transvar'},
     {'alt': 'G',
      'amino_acid_change': 'p.H1047R',
      'cdna': 'c.3140A>G',
      'chr': '3',
      'compositeKey': 'c36a76cb5fcd59eb3bc343579d7b3e2c',
      'custom': False,
      'exonNumber': '21',
      'intronNumber': '',
      'ref': 'A',
      'referenceGenome': 'grch37_hg19',
      'start': '178952085',
      'stop': '178952085',
      'strand': '+',
      'suppress': False,
      'transcript': 'NM_006218',
      'validated': 'transvar'},
     {'alt': 'G',
      'amino_acid_change': 'p.H1047R',
      'cdna': 'c.3140A>G',
      'chr': '3',
      'compositeKey': '6bec3e88de828c4da9fb30c3031b8d81',
      'custom': False,
      'exonNumber': '20',
      'intronNumber': '',
      'ref': 'A',
      'referenceGenome': 'grch37_hg19',
      'start': '178952085',
      'stop': '178952085',
      'strand': '+',
      'suppress': False,
      'transcript': 'CCDS43171',
      'validated': 'transvar'}],
    'uniprotTranscript': 'NM_006218.2'},
   {'GRCh37_location': [],
    '_src': 1,
    'cdna': [],
    'description': '',
    'geneSymbol': 'ESR1',
    'id': 'cdea1230df85587a78fbcce80c9eefae',
    'mutation_type': ['Parent Mutation'],
    'name': 'ER/PR positive',
    'parents': [],
    'pathology': [],
    'sources': [],
    'synonyms': [],
    'transcriptConsequence': []},
   {'GRCh37_location': [],
    '_src': 1,
    'cdna': [],
    'description': '',
    'exonsInfo': {'cdsEnd': 37884297,
     'cdsStart': 37856491,
     'chr': '17',
     'exonBoundaries': {'1': {'start': 37856491, 'stop': 37856564},
      '10': {'start': 37871538, 'stop': 37871612},
      '11': {'start': 37871698, 'stop': 37871789},
      '12': {'start': 37871992, 'stop': 37872192},
      '13': {'start': 37872553, 'stop': 37872686},
      '14': {'start': 37872767, 'stop': 37872858},
      '15': {'start': 37873572, 'stop': 37873733},
      '16': {'start': 37876039, 'stop': 37876087},
      '17': {'start': 37879571, 'stop': 37879710},
      '18': {'start': 37879790, 'stop': 37879913},
      '19': {'start': 37880164, 'stop': 37880263},
      '2': {'start': 37863242, 'stop': 37863394},
      '20': {'start': 37880978, 'stop': 37881164},
      '21': {'start': 37881301, 'stop': 37881457},
      '22': {'start': 37881579, 'stop': 37881655},
      '23': {'start': 37881959, 'stop': 37882106},
      '24': {'start': 37882814, 'stop': 37882912},
      '25': {'start': 37883067, 'stop': 37883256},
      '26': {'start': 37883547, 'stop': 37883800},
      '27': {'start': 37883941, 'stop': 37884297},
      '3': {'start': 37864573, 'stop': 37864787},
      '4': {'start': 37865570, 'stop': 37865705},
      '5': {'start': 37866065, 'stop': 37866134},
      '6': {'start': 37866338, 'stop': 37866454},
      '7': {'start': 37866592, 'stop': 37866734},
      '8': {'start': 37868180, 'stop': 37868300},
      '9': {'start': 37868574, 'stop': 37868701}},
     'transcript': 'NM_004448',
     'txEnd': 37884915,
     'txStart': 37856230},
    'geneSymbol': 'ERBB2',
    'id': '8264062dcf781a6ea7faf090c9b7962d',
    'longestTranscript': 'NM_004448.3',
    'mutation_type': ['Copy Number Variant'],
    'name': 'ERBB2 Loss',
    'parents': [],
    'pathology': [],
    'sources': ['MolecularMatch'],
    'synonyms': [],
    'transcript': 'NM_004448.3',
    'transcriptConsequence': [],
    'uniprotTranscript': 'NM_004448.3'}],
  'mvld': '2',
  'narrative': 'PIK3CA H1047R,ER/PR positive and ERBB2 Loss confers sensitivity to Exemestane, Everolimus and Everolimus + Exemestane in patients with Neoplasm of breast',
  'noTherapyAvailable': False,
  'prevalence': [{'count': 424,
    'percent': 17.9,
    'samples': 2369,
    'studyId': 'PAN CANCER MAX'},
   {'count': 4384,
    'percent': 6.83547,
    'samples': 64136,
    'studyId': 'PAN CANCER AVG'},
   {'condition': '',
    'count': 424,
    'molecular': '',
    'percent': 17.9,
    'samples': 2369,
    'studyId': 'PAN CANCER MAX'},
   {'condition': '',
    'count': 424,
    'molecular': '',
    'percent': 17.9,
    'samples': 2369,
    'studyId': 'PAN CANCER MAX'},
   {'condition': '',
    'count': 424,
    'molecular': '',
    'percent': 17.9,
    'samples': 2369,
    'studyId': 'brca_metabric'},
   {'condition': '',
    'count': 75,
    'molecular': '',
    'percent': 14.79,
    'samples': 507,
    'studyId': 'brca_tcga_pub'},
   {'condition': '',
    'count': 116,
    'molecular': '',
    'percent': 14.2,
    'samples': 817,
    'studyId': 'brca_tcga_pub2015'},
   {'condition': '',
    'count': 13,
    'molecular': '',
    'percent': 13,
    'samples': 100,
    'studyId': 'brca_sanger'},
   {'condition': '',
    'count': 11,
    'molecular': '',
    'percent': 10.68,
    'samples': 103,
    'studyId': 'brca_broad'},
   {'condition': '',
    'count': 1970,
    'molecular': '',
    'percent': 8.00455,
    'samples': 24611,
    'studyId': 'PAN CANCER AVG'},
   {'condition': '',
    'count': 5,
    'molecular': '',
    'percent': 7.69,
    'samples': 65,
    'studyId': 'brca_bccrc'},
   {'condition': '',
    'count': 773,
    'molecular': '',
    'percent': 6.95081,
    'samples': 11121,
    'studyId': 'PAN CANCER AVG'},
   {'condition': '',
    'count': 4,
    'molecular': '',
    'percent': 6.56,
    'samples': 61,
    'studyId': 'lgg_ucsf_2014'},
   {'condition': '',
    'count': 4,
    'molecular': '',
    'percent': 5.56,
    'samples': 72,
    'studyId': 'coadread_genentech'},
   {'condition': '',
    'count': 2,
    'molecular': '',
    'percent': 5,
    'samples': 40,
    'studyId': 'hnsc_mdanderson_2013'},
   {'condition': '',
    'count': 3,
    'molecular': '',
    'percent': 5,
    'samples': 60,
    'studyId': 'cellline_nci60'},
   {'condition': '',
    'count': 12,
    'molecular': '',
    'percent': 4.84,
    'samples': 248,
    'studyId': 'ucec_tcga_pub'},
   {'condition': '',
    'count': 1,
    'molecular': '',
    'percent': 4.55,
    'samples': 22,
    'studyId': 'stad_uhongkong'},
   {'condition': '',
    'count': 1,
    'molecular': '',
    'percent': 4.55,
    'samples': 22,
    'studyId': 'ucs_jhu_2014'},
   {'condition': '',
    'count': 6,
    'molecular': '',
    'percent': 4.35,
    'samples': 138,
    'studyId': 'coadread_mskcc'},
   {'condition': '',
    'count': 12,
    'molecular': '',
    'percent': 4.15,
    'samples': 289,
    'studyId': 'stad_tcga_pub'},
   {'condition': '',
    'count': 1,
    'molecular': '',
    'percent': 4,
    'samples': 25,
    'studyId': 'skcm_broad_dfarber'},
   {'condition': '',
    'count': 8,
    'molecular': '',
    'percent': 3.57,
    'samples': 224,
    'studyId': 'coadread_tcga_pub'},
   {'condition': '',
    'count': 1,
    'molecular': '',
    'percent': 3.13,
    'samples': 32,
    'studyId': 'hnsc_jhu'},
   {'condition': '',
    'count': 15,
    'molecular': '',
    'percent': 2.42,
    'samples': 619,
    'studyId': 'coadread_dfci_2016'},
   {'condition': '',
    'count': 20,
    'molecular': '',
    'percent': 2.21,
    'samples': 905,
    'studyId': 'cellline_ccle_broad'},
   {'condition': '',
    'count': 6,
    'molecular': '',
    'percent': 2.15,
    'samples': 279,
    'studyId': 'hnsc_tcga_pub'},
   {'condition': '',
    'count': 2,
    'molecular': '',
    'percent': 2.06,
    'samples': 97,
    'studyId': 'blca_mskcc_solit_2012'},
   {'condition': '',
    'count': 2,
    'molecular': '',
    'percent': 1.65,
    'samples': 121,
    'studyId': 'skcm_broad'},
   {'condition': '',
    'count': 2,
    'molecular': '',
    'percent': 1.52,
    'samples': 132,
    'studyId': 'hnc_mskcc_2016'},
   {'condition': '',
    'count': 2,
    'molecular': '',
    'percent': 1.46,
    'samples': 137,
    'studyId': 'escc_ucla_2014'},
   {'condition': '',
    'count': 1,
    'molecular': '',
    'percent': 1.35,
    'samples': 74,
    'studyId': 'hnsc_broad'},
   {'condition': '',
    'count': 1,
    'molecular': '',
    'percent': 1.28,
    'samples': 78,
    'studyId': 'ccrcc_irc_2014'},
   {'condition': '',
    'count': 1,
    'molecular': '',
    'percent': 1.14,
    'samples': 88,
    'studyId': 'escc_icgc'},
   {'condition': '',
    'count': 2,
    'molecular': '',
    'percent': 1.12,
    'samples': 178,
    'studyId': 'lusc_tcga_pub'},
   {'condition': 'Malignant neoplastic disease',
    'count': 11,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 1.11,
    'samples': 995,
    'studyId': 'cellline_ccle_broad'},
   {'condition': '',
    'count': 1,
    'molecular': '',
    'percent': 1.02,
    'samples': 98,
    'studyId': 'kirc_bgi'},
   {'condition': '',
    'count': 1,
    'molecular': '',
    'percent': 1.01,
    'samples': 99,
    'studyId': 'blca_bgi'},
   {'condition': '',
    'count': 1,
    'molecular': '',
    'percent': 1,
    'samples': 100,
    'studyId': 'stad_pfizer_uhongkong'},
   {'condition': '',
    'count': 1,
    'molecular': '',
    'percent': 0.97,
    'samples': 103,
    'studyId': 'prad_mskcc'},
   {'condition': '',
    'count': 1,
    'molecular': '',
    'percent': 0.92,
    'samples': 109,
    'studyId': 'paad_utsw_2015'},
   {'condition': 'Adenocarcinoma of prostate',
    'count': 3,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0.9,
    'samples': 333,
    'studyId': 'prad_tcga_pub'},
   {'condition': '',
    'count': 1,
    'molecular': '',
    'percent': 0.89,
    'samples': 112,
    'studyId': 'prad_broad'},
   {'condition': '',
    'count': 1,
    'molecular': '',
    'percent': 0.85,
    'samples': 117,
    'studyId': 'thyroid_mskcc_2016'},
   {'condition': '',
    'count': 2,
    'molecular': '',
    'percent': 0.69,
    'samples': 291,
    'studyId': 'gbm_tcga_pub2013'},
   {'condition': '',
    'count': 5,
    'molecular': '',
    'percent': 0.62,
    'samples': 812,
    'studyId': 'lgggbm_tcga_pub'},
   {'condition': '',
    'count': 2,
    'molecular': '',
    'percent': 0.52,
    'samples': 383,
    'studyId': 'paad_qcmg_uq_2016'},
   {'condition': 'Acute myeloid leukaemia, disease',
    'count': 1,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0.52,
    'samples': 191,
    'studyId': 'laml_tcga_pub'},
   {'condition': 'Adenocarcinoma of prostate',
    'count': 1,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0.52,
    'samples': 194,
    'studyId': 'prad_mskcc'},
   {'condition': '',
    'count': 2,
    'molecular': '',
    'percent': 0.45,
    'samples': 449,
    'studyId': 'prad_cpcg_2017'},
   {'condition': '',
    'count': 1,
    'molecular': '',
    'percent': 0.43,
    'samples': 230,
    'studyId': 'luad_tcga_pub'},
   {'condition': '',
    'count': 1,
    'molecular': '',
    'percent': 0.32,
    'samples': 316,
    'studyId': 'ov_tcga_pub'},
   {'condition': 'Glioblastoma multiforme',
    'count': 1,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0.18,
    'samples': 563,
    'studyId': 'gbm_tcga_pub2013'},
   {'condition': 'Infiltrating duct carcinoma of breast',
    'count': 1,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0.13,
    'samples': 778,
    'studyId': 'brca_tcga_pub'},
   {'condition': 'Infiltrating duct carcinoma of breast',
    'count': 1,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0.12,
    'samples': 816,
    'studyId': 'brca_tcga_pub2015'},
   {'condition': 'Astrocytoma/Oligodendroglioma,Low grade oligodendroglioma of brain,Glioblastoma multiforme,Anaplastic Oligoastrocytoma,Oligodendroglioma of brain,Low grade glioma of brain',
    'count': 1,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0.09,
    'samples': 1084,
    'studyId': 'lgggbm_tcga_pub'},
   {'condition': 'Grawitz tumor',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 62,
    'studyId': 'urcc_mskcc_2016'},
   {'condition': 'Metastatic adenocarcinoma of prostate',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 61,
    'studyId': 'prad_mich'},
   {'condition': 'Adenocarcinoma of prostate',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 149,
    'studyId': 'prad_fhcrc'},
   {'condition': 'Adenocarcinoma of prostate',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 56,
    'studyId': 'prad_broad_2013'},
   {'condition': 'Adenocarcinoma of prostate',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 109,
    'studyId': 'prad_broad'},
   {'condition': 'CA - Cancer of pancreas',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 109,
    'studyId': 'paad_utsw_2015'},
   {'condition': 'Serous papillary cystadenocarcinoma ovary',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 489,
    'studyId': 'ov_tcga_pub'},
   {'condition': 'Neuroendocrine prostate cancer',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 107,
    'studyId': 'nepc_wcm_2016'},
   {'condition': 'Malignant peripheral nerve sheath tumour',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 15,
    'studyId': 'mpnst_mskcc'},
   {'condition': 'Squamous cell carcinoma of lung',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 178,
    'studyId': 'lusc_tcga_pub'},
   {'condition': 'Adenocarcinoma of lung',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 230,
    'studyId': 'luad_tcga_pub'},
   {'condition': 'Adenocarcinoma of lung',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 182,
    'studyId': 'luad_broad'},
   {'condition': 'Hepatocarcinoma',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 231,
    'studyId': 'lihc_amc_prv'},
   {'condition': 'Adenocarcinoma of stomach',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 293,
    'studyId': 'stad_tcga_pub'},
   {'condition': 'PTC - Papillary thyroid carcinoma',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 496,
    'studyId': 'thca_tcga_pub'},
   {'condition': 'Grawitz tumor',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 436,
    'studyId': 'kirc_tcga_pub'},
   {'condition': 'Chromophobe carcinoma',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 66,
    'studyId': 'kich_tcga_pub'},
   {'condition': 'Head and Neck Squamous Cell Carcinoma',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 279,
    'studyId': 'hnsc_tcga_pub'},
   {'condition': 'Cancer of salivary gland,Neoplasm of head and neck',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 151,
    'studyId': 'hnc_mskcc_2016'},
   {'condition': 'Glioblastoma multiforme',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 206,
    'studyId': 'gbm_tcga_pub'},
   {'condition': 'Sarcoma',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 207,
    'studyId': 'sarc_mskcc'},
   {'condition': 'Neoplasm of colorectum',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 257,
    'studyId': 'coadread_tcga_pub'},
   {'condition': 'Malignant neoplastic disease',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 60,
    'studyId': 'cellline_nci60'},
   {'condition': 'Adenocarcinoma of uterus,Neoplasm of endometrium',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 363,
    'studyId': 'ucec_tcga_pub'},
   {'condition': 'Metastatic adenocarcinoma of prostate',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 150,
    'studyId': 'prad_su2c_2015'},
   {'condition': 'Adenocarcinoma of prostate',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 104,
    'studyId': 'prad_mskcc_2014'},
   {'condition': 'Neoplasm of breast',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 2173,
    'studyId': 'brca_metabric'},
   {'condition': 'Infiltrating duct carcinoma of breast',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 30,
    'studyId': 'brca_bccrc_xenograft_2014'},
   {'condition': 'Transitional cell carcinoma of bladder',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 128,
    'studyId': 'blca_tcga_pub'},
   {'condition': 'Plasmacytoid urothelial carcinoma',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 33,
    'studyId': 'blca_plasmacytoid_mskcc_2016'},
   {'condition': 'Tumour of urinary bladder',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 109,
    'studyId': 'blca_mskcc_solit_2014'},
   {'condition': 'Adenoid cystic carcinoma of salivary gland',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 60,
    'studyId': 'acyc_mskcc_2013'},
   {'condition': 'Adenoid cystic carcinoma of breast',
    'count': 0,
    'molecular': 'ERBB2 Copy Number Loss',
    'percent': 0,
    'samples': 12,
    'studyId': 'acbc_mskcc_2015'}],
  'regulatoryBody': 'FDA',
  'regulatoryBodyApproved': False,
  'sixtier': '5',
  'sources': [{'functionalConsequence': '',
    'id': '1510069321057',
    'link': 'https://www.ncbi.nlm.nih.gov/pubmed/28183140',
    'name': 'PUBMED',
    'pubId': '28183140',
    'subType': 'retrospective',
    'suppress': False,
    'trialId': 'NCT00863655',
    'trialPhase': '',
    'trustRating': 0,
    'type': 'trial',
    'valid': True,
    'year': ''}],
  'tags': [{'compositeKey': 'PIK3CA H1047RMUTATIONinclude',
    'facet': 'MUTATION',
    'filterType': 'include',
    'generatedBy': '',
    'generatedByTerm': '',
    'isNew': True,
    'manualSuppress': 0,
    'primary': False,
    'priority': 1,
    'suppress': False,
    'term': 'PIK3CA H1047R',
    'transcript': '',
    'valid': True},
   {'compositeKey': 'MenopauseFINDINGinclude',
    'facet': 'FINDING',
    'filterType': 'include',
    'generatedBy': '',
    'generatedByTerm': '',
    'isNew': True,
    'manualSuppress': 0,
    'primary': False,
    'priority': 1,
    'suppress': False,
    'term': 'Menopause',
    'transcript': '',
    'valid': True},
   {'compositeKey': 'Neoplasm of breastCONDITIONinclude',
    'facet': 'CONDITION',
    'filterType': 'include',
    'generatedBy': '',
    'generatedByTerm': '',
    'isNew': True,
    'manualSuppress': 0,
    'primary': True,
    'priority': 1,
    'suppress': False,
    'term': 'Neoplasm of breast',
    'transcript': '',
    'valid': True},
   {'compositeKey': 'ER/PR positiveMUTATIONinclude',
    'facet': 'MUTATION',
    'filterType': 'include',
    'generatedBy': '',
    'generatedByTerm': '',
    'isNew': True,
    'manualSuppress': 0,
    'primary': False,
    'priority': 1,
    'suppress': False,
    'term': 'ER/PR positive',
    'transcript': '',
    'valid': True},
   {'compositeKey': 'ERBB2 LossMUTATIONinclude',
    'facet': 'MUTATION',
    'filterType': 'include',
    'generatedBy': '',
    'generatedByTerm': '',
    'isNew': True,
    'manualSuppress': 0,
    'primary': False,
    'priority': 1,
    'suppress': False,
    'term': 'ERBB2 Loss',
    'transcript': '',
    'valid': True},
   {'custom': False,
    'facet': 'GENE',
    'filterType': 'include',
    'generatedBy': 'MUTATION',
    'generatedByTerm': 'ER/PR positive',
    'priority': 0,
    'suppress': False,
    'term': 'ESR1'},
   {'custom': False,
    'facet': 'GENE',
    'filterType': 'include',
    'generatedBy': 'MUTATION',
    'generatedByTerm': 'ER/PR positive',
    'priority': 0,
    'suppress': False,
    'term': 'PGR'},
   {'custom': False,
    'facet': 'GENE',
    'filterType': 'include',
    'generatedBy': 'MUTATION',
    'generatedByTerm': 'PIK3CA H1047R',
    'priority': 0,
    'suppress': False,
    'term': 'PIK3CA'},
   {'custom': False,
    'facet': 'MUTATION',
    'filterType': 'include',
    'generatedBy': 'MUTATION',
    'generatedByTerm': 'PIK3CA H1047R',
    'priority': 0,
    'suppress': False,
    'term': 'PIK3CA exon 21 mutation'},
   {'custom': False,
    'facet': 'CONDITION',
    'filterType': 'include',
    'generatedBy': 'CONDITION',
    'generatedByTerm': 'Neoplasm of breast',
    'priority': 0,
    'suppress': False,
    'term': 'Neoplasia'},
   {'custom': False,
    'facet': 'CONDITION',
    'filterType': 'include',
    'generatedBy': 'CONDITION',
    'generatedByTerm': 'Neoplasm of breast',
    'priority': 0,
    'suppress': False,
    'term': 'Malignant neoplastic disease'},
   {'custom': False,
    'facet': 'CONDITION',
    'filterType': 'include',
    'generatedBy': 'CONDITION',
    'generatedByTerm': 'Neoplasm of breast',
    'priority': 0,
    'suppress': False,
    'term': 'Solid tumor'},
   {'custom': False,
    'facet': 'SITE',
    'filterType': 'include',
    'generatedBy': 'CONDITION',
    'generatedByTerm': 'Neoplasm of breast',
    'priority': 0,
    'suppress': False,
    'term': 'Mammary gland'}],
  'therapeuticContext': [{'facet': 'DRUG',
    'name': 'Exemestane',
    'suppress': False,
    'valid': True},
   {'facet': 'DRUG', 'name': 'Everolimus', 'suppress': False, 'valid': True},
   {'facet': 'DRUG',
    'name': 'Everolimus + Exemestane',
    'suppress': False,
    'valid': True}],
  'tier': '2C',
  'tierExplanation': [{'message': 'FDA Approved',
    'step': 1,
    'success': False,
    'tier': '1A'},
   {'message': 'Guideline established',
    'step': 2,
    'success': False,
    'tier': '1A'},
   {'message': 'Clinical Expert Opinion',
    'step': 3,
    'success': False,
    'tier': '1B'},
   {'message': 'Phase 2,3 or 4 Trial Found',
    'step': 4,
    'success': False,
    'tier': '1B'},
   {'message': 'Prospective Trial found',
    'step': 5,
    'success': False,
    'tier': '1B'},
   {'message': 'Phase 1 Trial Found',
    'step': 6,
    'success': False,
    'tier': '2C'},
   {'message': 'Retrospective Institutional Study found',
    'step': 7,
    'success': False,
    'tier': '2C'},
   {'message': 'Retrospective Trial(NCT00863655) found',
    'step': 8,
    'success': True,
    'tier': '2C'}],
  'uniqueKey': 'f9c13fa7e75af5f3ef508599bfdea73a',
  'variantInfo': [{'COSMIC_ID': '',
    'classification': 'actionable',
    'consequences': ['Parent Mutation'],
    'fusions': [],
    'gene': 'ESR1',
    'geneFusionPartner': '',
    'locations': [],
    'name': 'ER/PR positive',
    'popFreqMax': '',
    'transcript': ''},
   {'COSMIC_ID': '',
    'classification': 'actionable',
    'consequences': ['Missense', 'Synonymous'],
    'fusions': [],
    'gene': 'PIK3CA',
    'geneFusionPartner': '',
    'locations': [{'alt': 'G',
      'amino_acid_change': 'H1047R',
      'cdna': 'c.3140A>G',
      'chr': '3',
      'exonNumber': '21',
      'intronNumber': '',
      'ref': 'A',
      'referenceGenome': 'grch37_hg19',
      'start': '178952085',
      'stop': '178952085',
      'strand': '+'}],
    'name': 'PIK3CA H1047R',
    'popFreqMax': '',
    'transcript': 'NM_006218.2'},
   {'COSMIC_ID': '',
    'classification': 'actionable',
    'consequences': ['Copy Number Variant'],
    'fusions': [],
    'gene': 'ERBB2',
    'geneFusionPartner': '',
    'locations': [{'chr': '17', 'start': 37856230, 'stop': 37884915}],
    'name': 'ERBB2 Loss',
    'popFreqMax': '',
    'transcript': 'NM_004448.3'}],
  'version': 2},
 'source': 'molecularmatch',
 'tags': []}
grmayfie commented 6 years ago

@ahwagner

Yes, in short, this is intended behavior. The 'heirarchy' field would show the id number you expect, as well as the rest of the tree to the root feature classification. The 'parent_soid' refers to the highest level of the tree, or the 'root' classification.

In general the behavior of the mutation classifier needs some cleaning in terms of now traversing the tree all the way to the root in all cases, because root classifications of 'sequence_feature' don't really help.

Also, the name 'parent_soid' could be changed to something better descriptive, if desired.

ahwagner commented 6 years ago

Thank you for clarifying @mayfielg. We should change the field name to root or something similar then. I think that the atypical use of the term parent may be misleading here. Also, I agree that we will need to reconsider our strategy on comparing within and between ontologies. If there is time, I will bring the topic up in our VICC call tomorrow.