hetio / medline

Computing term cooccurrence in MEDLINE
https://doi.org/10.15363/thinklab.d67
17 stars 4 forks source link

PubMed search matches subset of term name when there are no matches #3

Closed dhimmel closed 3 years ago

dhimmel commented 3 years ago

C566272 is Townes-Brocks-Branchiootorenal-Like Syndrome. Looking on the mesh browser this term has a frequency of 0 meaning it's never tagged a medline topic.

We're currently using the following PubMed search: Townes-Brocks-Branchiootorenal-Like Syndrome [MeSH Terms:noexp], which returns 117,315 results and has the message:

The following term was not found in PubMed: Townes-Brocks-Branchiootorenal-Like

Note now this doesn't include "Syndrome". So I think what's happening is that PubMed isn't finding the entire search term and is falling back to just searching Syndrome [MeSH Terms:noexp], which is matching 117,315 records.

So we probably have to quote the search term.

dhimmel commented 3 years ago

Looking up PubMed citations by the MeSH ID like C566272 [MeSH Terms:noexp] would be far preferable since it would reduce the parsing and encoding complexity while guarding against any name changes to a MeSH term. However, I don't think this is supported. I don't see it mentioned in Searching PubMed with MeSH.

It's also possible we could download bulk data at https://www.nlm.nih.gov/databases/download/pubmed_medline.html.

dhimmel commented 3 years ago

I think the correct search would be "Townes-Brocks-Branchiootorenal-Like Syndrome" [Supplementary Concept], which quotes the MeSH label and uses the supplementary concept suffix as per #4. This query returns 0 results, which is expected, and provides the following message:

No results were found. Your search was processed without automatic term mapping because it retrieved zero results. The following term was ignored: Townes-Brocks-Branchiootorenal-Like Syndrome

dhimmel commented 3 years ago

I followed up on my contact with Ryan Cohen from the NCBI Helpdesk in https://github.com/hetio/medline/issues/4#issuecomment-818100154:

I had another issue where I didn't properly quote the MeSH term name: https://github.com/hetio/medline/issues/3. Do you know whether it's possible to search by MeSH ID? For example, is there a way to have a query like "C006737 [Supplementary Concept]"?

Any documentation on how to quote / escape whitespace or special characters in search terms?

Ryan replied:

For "Townes-Brocks-Branchiootorenal-Like Syndrome" [Supplementary Concept], this can be used in another NCBI database such as GTR. There are currently no citation records in MEDLINE/PubMed indexed with this MeSH SCR.

No, there is not a way to search by MeSH ID. The MeSH translation table does not include the MeSH ID.

Please see the PubMed help for a list of PubMed character conversions.

From the character conversions docs:

double quotes " - used to force a phrase search