muschellij2 / rscopus

Scopus Database API Interface to R
74 stars 16 forks source link

Question regarding scopus_search() #23

Closed jrosen48 closed 4 years ago

jrosen48 commented 5 years ago

Hi, thanks for an awesome package! Very excited to see this.

When I run scopus_search(), I seem to run into an error at the same point (around 65% of the way through searching all of the results) regardless of how I set the wait time, rate, etc.

Here is the query:

scopus_search(query = "ISSN(0022-0663)", max_count = 8000, count = 25, wait_time = 7)

Here is the error that is returned:

Error in get_results(query, start = init_start, count = count, verbose = verbose, : Bad Request (HTTP 400).

Any idea what is causing this? Is it possible to handle this error and to proceed to the rest of the articles?

muschellij2 commented 5 years ago

Does it work when you turn the max count lower? I would reduce the max count and see if that works and then I would increment the start parameter

On Sat, Apr 20, 2019 at 9:38 AM Joshua Rosenberg notifications@github.com wrote:

Hi, thanks for an awesome package! Very excited to see this.

When I run search_scopus(), I seem to run into an error at the same point (around 65% of the way through searching all of the results) regardless of how I set the wait time, rate, etc.

Here is the error that is returned:

Error in get_results(query, start = init_start, count = count, verbose = verbose, : Bad Request (HTTP 400).

Here is the query:

scopus_search(query = "ISSN(0022-0663)", max_count = 8000, count = 25, wait_time = 7)

Any idea what is causing this? Is it possible to handle this error and to proceed to the rest of the articles? Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/muschellij2/rscopus/issues/23, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIGPLSJKLTNTRJJGFQ64WTPRMMG7ANCNFSM4HHJWAJA .

-- John

lucasxteixeira commented 5 years ago

I stumbled upon the same error, it happens when there are a lot of results, but I couldn't determine where it happens. Below is a minimal example:

rscopus::scopus_search(paste0("SRCID('25674') AND PUBYEAR = 2015"), view = "COMPLETE", start = 5000, count = 25, max_count = 25)

The error is the same:

Error in get_results(query, start = init_start, count = count, verbose = verbose, : Bad Request (HTTP 400).

If I reduce the start just a bit it works:

rscopus::scopus_search(paste0("SRCID('25674') AND PUBYEAR = 2015"), view = "COMPLETE", start = 4975, count = 25, max_count = 25)

muschellij2 commented 5 years ago

SRCID doesn't seem to be a field: https://dev.elsevier.com/tips/ScopusSearchTips.htm

Searches the following fields: ABS, AFFIL, ARTNUM, AUTH, AUTHCOLLAB, CHEM, CODEN, CONF, DOI, EDITOR, ISBN, ISSN, ISSUE, KEY, LANGUAGE, MANUFACTURER, PUBLISHER, PUBYEAR, REF, SEQBANK, SEQNUMBER, SRCTITLE, VOLUME, and TITLE.
muschellij2 commented 5 years ago

Also, it may be a 5000 API limit: https://buildmedia.readthedocs.org/media/pdf/scopus/latest/scopus.pdf

https://dev.elsevier.com/documentation/ScopusSearchAPI.wadl You may have to customize the call using cursor:

Under normal circumstances, when using the 'start' parameter (results offset), access to the total result set is limited to a predefined maximum number of results. By using the cursor in place of the 'start' the user can iterate to the very end of the result set, with the restriction that results can only be accessed by iterating forward sequentially (there will be no 'prev' or 'last' links available). 
muschellij2 commented 5 years ago

Run this to see something about the limits I think:

res = rscopus::scopus_search(paste0("SRCID('25674') AND PUBYEAR = 2015"), view = "COMPLETE", start = 4975, count = 25, max_count = 25)

x = httr::content(res$get_statements, as = "text")
df = jsonlite::fromJSON(x, flatten = TRUE)
df$`search-results`$link
muschellij2 commented 5 years ago

Also, the start parameter states:

Numeric value representing the results offset (i.e. starting position for the search results). The maximum for this value is a system-level default (varies with search cluster) minus the number of results requested. If not specified the offset will be set to zero (i.e. first search result) 

This may be 5000 for the scopus search api

lucasxteixeira commented 5 years ago

You are entirely correct @muschellij2, the problem is the 5000 limit. The weird thing is that the limitation only strikes when I try to fetch information that starts after the 5000 response. I tried to split the query into multiple queries (i.e. one query per month) and it is working just fine.

ureber commented 2 years ago

Sorry, I don't want to open this issue again, but as I keep running into the same problem: How can I narrow the search to months as @lucasxteixeira suggested? I get a HTTP 400 if I add the month to the PUBYEAR parameter, e.g. 1995-10. Thanks!