ncbo / ontologies_api

Hypermedia API for NCBO's ontology-related projects
http://data.bioontology.org
Other
25 stars 10 forks source link

Search and classes endpoints return different sets of synonyms for same class #51

Open jvendetti opened 6 years ago

jvendetti commented 6 years ago

Originally reported on the support list by an end user.

/classes endpoint

Issuing a REST call against the classes endpoint for CPT term with ID: "http://purl.bioontology.org/ontology/CPT/99213", i.e.:

http://data.bioontology.org/ontologies/CPT/classes/http%3A%2F%2Fpurl.bioontology.org%2Fontology%2FCPT%2F99213

... returns a result set with a single synonym:

screenshot 2018-08-02 14 55 41

/search endpoint

Issuing a REST call against the search endpoint with search query "99213", i.e.:

http://data.bioontology.org/search?q=99213

... returns a result set that contains CPT term with ID: "http://purl.bioontology.org/ontology/CPT/99213" (same class as above). In this particular result set, the term is listed with 6 synonyms instead of one:

screenshot 2018-08-02 15 01 07

I manually reindexed CPT and flushed the application caches to rule out a problem with indexing of the latest version. The indexing process completed successfully, but the same behavior occurs.

CloudCray commented 6 years ago

Would also add that in the existing UI, http://bioportal.bioontology.org/ontologies/CPT/?p=classes&conceptid=99213

Currently, "synonyms" is displaying a single result, while the rest are appearing under "altLabel"

synonyms: image

altLabel: image

jvendetti commented 6 years ago

@CloudCray - thanks for pointing that out about the altLabel property. It may be that the search endpoint looks for properties like altLabel and concatenates the values into synonym. Not sure - will need to check with @mdorf when he returns from break.

In the mean time though, you can set the include parameter to "all" to see those altLabel values when using the classes endpoint. Issuing a REST call like the following:

http://data.bioontology.org/ontologies/CPT/classes/http%3A%2F%2Fpurl.bioontology.org%2Fontology%2FCPT%2F99213?include=all

... will return a result set that includes property value pairs:

screenshot 2018-08-02 17 59 38

The include parameter is explained in more detail in the Common Parameters section of the API documentation. Essentially you can use it to customize the set of attributes returned.

mdorf commented 6 years ago

The plot thickens here because in trying to replicate the data that gets put into the Solr index in this case, I only get one synonym:

[11] pry(#<Sinatra::Application>)> cls.get_index_doc
=> {:resource_id=>"http://purl.bioontology.org/ontology/CPT/99213",
 :ontologyId=>"http://data.bioontology.org/ontologies/CPT/submissions/12",
 :submissionAcronym=>"CPT",
 :submissionId=>12,
 :ontologyType=>"ONTOLOGY",
 :obsolete=>"false",
 :childCount=>0,
 :id=>"http://purl.bioontology.org/ontology/CPT/99213",
 :prefLabel=>
  "Office or other outpatient visit for the evaluation and management of an established patient, which requires at least 2 of these 3 key components: An expanded problem focused history; An expanded problem focused examination; Medical decision making of low complexity. Counseling and coordination of care with other physicians, other qualified health care professionals, or agencies are provided consistent with the nature of the problem(s) and the patient's and/or family's needs. Usually, the presenting problem(s) are of low to moderate severity. Typically, 15 minutes are spent face-to-face with the patient and/or family.",
 :synonym=>
  ["Level 3 outpatient visit for evaluation and management of established patient with problem of low to moderate severity, including expanded history and medical decision making of low complexity - typical time with patient and/or family 15 minutes"],
 :cui=>["C4052041", "C3517372", "C4052038", "C4052039", "C4052040", "C0374559"],
 :semanticType=>["T058"],
 :property=>
  ["002: (Do not report 92227 in conjunction with 92002-92014, 92133, 92134, 92250, 92228 or with the evaluation and management of the single organ system, the eye, 99201-99350)",

get_index_doc is a method we call during the indexing process to get the indexable data for a given class. Yet, clearly, the multiple synonyms ARE in the current index (below is a result of searching for this class directly in Solr):

{
  "resource_id":"http://purl.bioontology.org/ontology/CPT/99213",
  "ontologyId":"http://data.bioontology.org/ontologies/CPT/submissions/12",
  "submissionAcronym":"CPT",
  "submissionId":12,
  "ontologyType":"ONTOLOGY",
  "obsolete":false,
  "childCount":0,
  "id":"http://purl.bioontology.org/ontology/CPT/99213_CPT_12",
  "prefLabel":"Office or other outpatient visit for the evaluation and management of an established patient, which requires at least 2 of these 3 key components: An expanded problem focused history; An expanded problem focused examination; Medical decision making of low complexity. Counseling and coordination of care with other physicians, other qualified health care professionals, or agencies are provided consistent with the nature of the problem(s) and the patient's and/or family's needs. Usually, the presenting problem(s) are of low to moderate severity. Typically, 15 minutes are spent face-to-face with the patient and/or family.",
  "prefLabelExact":"Office or other outpatient visit for the evaluation and management of an established patient, which requires at least 2 of these 3 key components: An expanded problem focused history; An expanded problem focused examination; Medical decision making of low complexity. Counseling and coordination of care with other physicians, other qualified health care professionals, or agencies are provided consistent with the nature of the problem(s) and the patient's and/or family's needs. Usually, the presenting problem(s) are of low to moderate severity. Typically, 15 minutes are spent face-to-face with the patient and/or family.",
  "prefLabelSuggest":"Office or other outpatient visit for the evaluation and management of an established patient, which requires at least 2 of these 3 key components: An expanded problem focused history; An expanded problem focused examination; Medical decision making of low complexity. Counseling and coordination of care with other physicians, other qualified health care professionals, or agencies are provided consistent with the nature of the problem(s) and the patient's and/or family's needs. Usually, the presenting problem(s) are of low to moderate severity. Typically, 15 minutes are spent face-to-face with the patient and/or family.",
  "prefLabelSuggestEdge":"Office or other outpatient visit for the evaluation and management of an established patient, which requires at least 2 of these 3 key components: An expanded problem focused history; An expanded problem focused examination; Medical decision making of low complexity. Counseling and coordination of care with other physicians, other qualified health care professionals, or agencies are provided consistent with the nature of the problem(s) and the patient's and/or family's needs. Usually, the presenting problem(s) are of low to moderate severity. Typically, 15 minutes are spent face-to-face with the patient and/or family.",
  "prefLabelSuggestNgram":"Office or other outpatient visit for the evaluation and management of an established patient, which requires at least 2 of these 3 key components: An expanded problem focused history; An expanded problem focused examination; Medical decision making of low complexity. Counseling and coordination of care with other physicians, other qualified health care professionals, or agencies are provided consistent with the nature of the problem(s) and the patient's and/or family's needs. Usually, the presenting problem(s) are of low to moderate severity. Typically, 15 minutes are spent face-to-face with the patient and/or family.",
  "notation":"99213",
  "synonym":["Level 3 outpatient visit for evaluation and management of established patient with problem of low to moderate severity, including expanded history and medical decision making of low complexity - typical time with patient and/or family 15 minutes",
    "Level 3 outpatient visit for evaluation and management of established patient with problem of low to moderate severity, including expanded physical examination and medical decision making of low complexity- typical time with patient and/or family 15 minutes",
    "OFFICE OUTPATIENT VISIT 15 MINUTES",
    "Established patient office or other outpatient visit, typically 15 minutes",
    "Level 3 outpatient visit for evaluation and management of established patient with problem of low to moderate severity, including expanded history and physical examination, and medical decision making of low complexity - typical time with patient and/or family 15 minutes",
    "Level 3 outpatient visit for evaluation and management of established patient with problem of low to moderate severity, including expanded history and physical examination - typical time with patient and/or family 15 minutes"],
  "synonymExact":["Level 3 outpatient visit for evaluation and management of established patient with problem of low to moderate severity, including expanded history and medical decision making of low complexity - typical time with patient and/or family 15 minutes",
    "Level 3 outpatient visit for evaluation and management of established patient with problem of low to moderate severity, including expanded physical examination and medical decision making of low complexity- typical time with patient and/or family 15 minutes",
    "OFFICE OUTPATIENT VISIT 15 MINUTES",
    "Established patient office or other outpatient visit, typically 15 minutes",
    "Level 3 outpatient visit for evaluation and management of established patient with problem of low to moderate severity, including expanded history and physical examination, and medical decision making of low complexity - typical time with patient and/or family 15 minutes",
    "Level 3 outpatient visit for evaluation and management of established patient with problem of low to moderate severity, including expanded history and physical examination - typical time with patient and/or family 15 minutes"],

The only reasonable explanation here is that the data in Solr isn't current, which is also unlikely because the re-indexing process first deletes all records corresponding to a given ontology before proceeding with re-indexing.

CloudCray commented 6 years ago

Looks like the list of altLabel is located in the properties object, under http://www.w3.org/2004/02/skos/core#altLabel, not in the synonym.

Adding the include=all or include=properties yields the data; however, I would still expect this data to be returned in the synonym array.

So for my use, case-closed - however, it may be nice to have some clearer documentation.

Thank you!

CloudCray commented 6 years ago

Adding to this - Unlike other ontology-specific properties, the altLabel will often contain important/relevant text not found in the description, it makes sense to append altLabel values to the synonyms so their text can be searchable - it's definitely a "feature", not a bug ;)

It's just surprising that the same key - "synonym" - is used for 2 different purposes in different places.

mdorf commented 6 years ago

More detail... There is a call executed during indexing:

LinkedData::Models::Class.map_attributes(c, paging.equivalent_predicates)

That call fills in properties for a class:

[9] pry(#<LinkedData::Models::OntologySubmission>)> LinkedData::Models::Class.map_attributes(c, paging.equivalent_predicates)
=> [:submission, :label, :prefLabel, :synonym, :definition, :obsolete, :notation, :prefixIRI, :parents, :subClassOf, :semanticType, :cui, :xref]

Once that call is made, the indexable document for this class appears with multiple synonyms instead of a single one. I'm assuming that this call is NOT made in other instances, where only a single synonym value is displayed:

[10] pry(#<LinkedData::Models::OntologySubmission>)> c.get_index_doc
=> {:resource_id=>"http://purl.bioontology.org/ontology/CPT/99213",
 :ontologyId=>"http://data.bioontology.org/ontologies/CPT/submissions/12",
 :submissionAcronym=>"CPT",
 :submissionId=>12,
 :ontologyType=>"ONTOLOGY",
 :obsolete=>"false",
 :childCount=>0,
 :id=>"http://purl.bioontology.org/ontology/CPT/99213",
 :prefLabel=>
  "Office or other outpatient visit for the evaluation and management of an established patient, which requires at least 2 of these 3 key components: An expanded problem focused history; An expanded problem focused examination; Medical decision making of low complexity. Counseling and coordination of care with other physicians, other qualified health care professionals, or agencies are provided consistent with the nature of the problem(s) and the patient's and/or family's needs. Usually, the presenting problem(s) are of low to moderate severity. Typically, 15 minutes are spent face-to-face with the patient and/or family.",
 :notation=>"99213",
 :synonym=>
  ["Level 3 outpatient visit for evaluation and management of established patient with problem of low to moderate severity, including expanded history and medical decision making of low complexity - typical time with patient and/or family 15 minutes",
   "Level 3 outpatient visit for evaluation and management of established patient with problem of low to moderate severity, including expanded physical examination and medical decision making of low complexity- typical time with patient and/or family 15 minutes",
   "OFFICE OUTPATIENT VISIT 15 MINUTES",
   "Established patient office or other outpatient visit, typically 15 minutes",
   "Level 3 outpatient visit for evaluation and management of established patient with problem of low to moderate severity, including expanded history and physical examination, and medical decision making of low complexity - typical time with patient and/or family 15 minutes",
   "Level 3 outpatient visit for evaluation and management of established patient with problem of low to moderate severity, including expanded history and physical examination - typical time with patient and/or family 15 minutes"],
 :cui=>["C4052041", "C3517372", "C4052038", "C4052039", "C4052040", "C0374559"],
 :semanticType=>["T058"],
 :property=>
  ["043: CPT Changes: An Insider's View 2017",
   "034: CPT Assistant Sep 10: 4",
   "024: CPT Assistant Sep 06: 8",