Open peetucket opened 5 years ago
see https://github.com/sul-dlss/sul_pub/pull/1060 for the work that added pubmed query editing
For example, in addition to what we do in the https://github.com/sul-dlss/sul_pub/blob/master/lib/pubmed/query_author.rb class, it appears we already have some code that is doing something similar for the WoS search, this class: https://github.com/sul-dlss/sul_pub/blob/master/lib/agent/author_institution.rb
It is stripping things like "and" and "university". It is used here to construct a list of institutions to add to the query:
https://github.com/sul-dlss/sul_pub/blob/master/lib/web_of_science/query_author.rb#L40-L42
We also end up creating name variants in https://github.com/sul-dlss/sul_pub/blob/master/lib/agent/author_name.rb that is used in the WoS queries, that we don't take advantage of in the Pubmed queries.
It would be nice to use the classes in lib/agent
for both WoS and Pubmed for consistency.
Thoughts on re-using this logic? Note that the reason we ended up stripping "University" and "Institution" and "College" in WoS queries is I believe for a similar reason (it was picking up extra stuff), which is perhaps not a problem for Pubmed. But wanted to acknowledge a bit of duplication here for consideration.
author=Author.find(37959)
WebOfScience::QueryAuthor.new(author).send(:institutions)
=> ["stanford", "oregon health & science", "washington"]
Currently the alternate institution lists need to be edited before crafting the query (to remove things like &, university, etc.). This is now done differently in both WoS vs Pubmed. We also have a different way of creating (or not) alternate naming variants to send to the query. We may want to create methods in a consistent way to do this for both harvesters if possible.