Closed Gerwi closed 5 years ago
Thanks - I can't really work around that quota for now. Also, make sure you have your API registered for research if that's the purpose as Scopus mandates this I believe.
Can you show me an example of the query or what you want to download? I can't debug or help without specifics.
Ok, hope that they will provide this in the future, but I am not that optimistic.
For my case I was trying to retrieve all publications of a small country via the scopus_search API.
For now I have changed my strategy and am planning to use PubmedIDs to circumvent this issue.
Here's some basic code I think gets you the most of the way:
library(rscopus)
au_ids = c(23480260200, 8708052900, 54896131300,
55570070100,
55479219200, 7409391345, 55500593700, 39362440900)
# get all the data for the authors (including all co-authors)
res = lapply(au_ids, author_data)
names(res) = au_ids
# get co-authors
all_authors = lapply(res, function(x) {
x$full_data$author
})
# get unique IDs for those authors
unique_authors = lapply(all_authors, function(x) {
unique(x$authid)
})
# collapse all authors together
combined_authors = unlist(unique_authors)
combined_authors = unique(combined_authors)
# don't need the original authors in there
combined_authors = setdiff(combined_authors, au_ids)
# just doing first 5 due to API limits (but you can run these in chunks)
run_authors = combined_authors[1:5]
all_author_res = lapply(
run_authors,
author_data,
count = 200, view = "STANDARD")
names(all_author_res) = run_authors
all_author_res[[1]]$df
Thanks a lot for this suggestion.
For know I am trying the following approach, starting with a csv file containing pubmedIDs:
#Cut pubmed_list in small parts (large search request are not handled by the API, and smaller can also maximize the utilization of the weekly quota's. Subsequently a search_string is created in the format: "PMID(123456 OR 123457)" which can be handled by the scopus_search. The output object of this are subsequently stored in a list.
chunk <- 5
n <- nrow(pubmed_list)
r <- rep(1:ceiling(n/chunk),each=chunk)[1:n]
d <- split(pubmed_list,r)
res_list <- list()
for (number in range(1:4)){
names(d)[names(d)==number]<-"string"
string = data.frame(d$string[[1]])
string$OR = " OR "
names(string) = c("pmid","OR")
string = paste0(string$pmid,string$OR)
string = substr(string,1,nchar(string)-4)
string = paste0("PMID(",string,")")
res=scopus_search(query=string, view="COMPLETE", max_count = 1)
res_list[[number]] <- res
names(d)[names(d)=="string"]<-number}
OT: Not related to this package, but nevertheless worth mentioning, some articles are included twice in Elsevier. For example: PMID(30428293)
Please follow up with the duplicates with Elsevier/Scopus.
You seem to have changed your goal. I gave the solution I feel you requested. I have provided the tools, but I don't have any other info in these things and you can open another issue for "Quota" limits, but otherwise this is a scripting question and not development question and am closing.
OK - where are you seeing the 80k limit?
I think PubMed IDs may cause some problems as I've seen them not return results given my permission for my API key: "API key in this example was setup with authorized CORS domains." as I've tried on interactive APIs: https://dev.elsevier.com/interactive.html
Such as PMID(30391859)
: https://www.ncbi.nlm.nih.gov/pubmed/30391859, but PMID(30391859)
in scopus search gets nothing.
Just as clarification, since I am not encountering issues anymore(so it can remain closed), my current strategy is to search for a particular disease, for example: "Heart Defects, Congenital"[Mesh], download the list with pubmedIDs and save them in a csv, and cut them in chuncks. Subsequently a for loop is used to transform the IDs to a search string and the output dataframes are stored in lists, which are rbind to dataframes. The for loop breaks automatically when the quota is reached.
`pubmed_list_diabetes<-read.csv("pubmed_list_diabetes.csv")
rm(d)
chunk <- 1
n <- nrow(pubmed_list)
r <- rep(1:ceiling(n/chunk),each=chunk)[1:n]
d <- split(pubmed_list,r)
publications_list = list()
affiliations_list = list()
authors_list = list()
remaining = 10
for (number in 1:20){
if (remaining < chunk){
break }
names(d)[names(d)==number]<-"string"
string = data.frame(d$string[[1]])
string$OR = " OR "
names(string) = c("pmid","OR")
string = paste0(string$pmid,string$OR)
string = substr(string,1,nchar(string)-4)
string = paste0("PMID(",string,")")
res=scopus_search(query=string, view="COMPLETE", max_count = 1)
entries = gen_entries_to_df(res$entries)
entries$df$entry_number2=paste0(number,".",entries$df$entry_number)
publications_list[[number]] = entries$df
entries$affiliation$entry_number2=paste0(number,".",entries$affiliation$entry_number)
affiliations_list[[number]] = entries$affiliation
entries$author$entry_number2=paste0(number,".",entries$author$entry_number)
authors_list[[number]] = entries$author
names(d)[names(d)=="string"]<-number
remaining=res$get_statements$headers$`x-ratelimit-remaining`}`
Such as
PMID(30391859)
: https://www.ncbi.nlm.nih.gov/pubmed/30391859, butPMID(30391859)
in scopus search gets nothing.
The searches returning no articles can be due to two reasons:
The first reason is not solvable, but the second one can be quite easily corrected, by downloading from PubMed a csv linking titles to PubMedIDs (which can be used to search by using the titles for the articles that retrieve no result when searching for their PubMedIDs).
Thanks for developing this package. It has been functioning perfectly so far. However, I have the following issue. My current search via the Scopus search function indicates that there are about 80k hits. Since the quota for this API is 20.000 publications per week, I can't download them all at once. I was wondering if there is a way to continue the download next week (when Elsevier will reset the quota's) from publication 20.001 till 40.000, and after waiting another week downloading 40.001-60.000.