PubMed search - formatting query string

sul-dlss / sul_pub

SUL system for harvest and managing publications for Stanford CAP, with controlled API access.

http://cap.stanford.edu

Other

8 stars 3 forks source link

PubMed search - formatting query string #125

Open dazza-codes opened 7 years ago

dazza-codes commented 7 years ago

At https://github.com/sul-dlss/sul_pub/blob/master/lib/pubmed_client.rb#L15, it does

pmid_list = Array(pmids)
pmidValuesForPost = pmid_list.collect { |pmid| "&id=#{pmid}" }.join

Some PubMed docs indicate the id param is a CSV, so maybe this code should be:

pmid_list = Array(pmids)
pmidValuesForPost = "&id=#{pmid_list.join(',')}"

peetucket commented 7 years ago

I just tested, and the response from PubMed works as coded. e.g.

pmids=['25277988','26488913']
pmclient=PubmedClient.new
response=pmclient.fetch_records_for_pmid_list(pmids)

produces an XML response with both docs listed as expected. In fact, you can even do this and it works:

pmids='25277988,26488913'
pmclient=PubmedClient.new
response=pmclient.fetch_records_for_pmid_list(pmids)

As implemented, our code allows you to send in an array or a comma delimited list and appears to get the correct response from PubMed either way.

dazza-codes commented 7 years ago

The change suggested applies within the code in pmclient.fetch_records_for_pmid_list, not to how that method is called. As tested ^^, it will result in the same requests. (Good to know that they seem to be working.)

peetucket commented 7 years ago

Yup, but since it is all working as is, I am not sure there is a need to change it.

dazza-codes commented 7 years ago

What happens when the list of PubMed IDs get larger and larger? Is there any limit that breaks it? Testing it with 2 IDs is a small request that runs quickly. What about say 500 IDs? There might be a case where the harvester pulls in SW records for a new professor who could have hundreds of pubs?