ropensci / rentrez

talk with NCBI entrez using R
https://docs.ropensci.org/rentrez
Other
195 stars 38 forks source link

HTTP failure: 429 #132

Closed anusurendra closed 5 years ago

anusurendra commented 5 years ago

Hi,

I am getting the following error :

Error in { : task 1 failed - "HTTP failure: 429
{"error":"API rate limit exceeded","api-key":"132.246.3.117","count":"26","limit":"3"}

I have pasted both

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=6060535&retmote=rsr
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=rentrez

into my browser and have no issue.

Below is the code I am using:

 nrpl.data.m2 <- foreach (i=1:dim(nrpl.data.m)[1],.packages = c("rentrez"), .combine=cbind) %dopar% {

    res <- entrez_search(db = "protein", term = paste(as.character(nrpl.data.m$sacc[i]),"[ACCN]",sep=""))
    esums <- entrez_summary(db = "protein", id = res$ids)

    res1 <- entrez_search(db = "taxonomy", term = paste(esums$taxid,"[uid]",sep=""))
    esums1 <- entrez_summary(db = "taxonomy", id = res1$ids)

    nrpl.data.m[i,c(13:15)] <- c(esums$taxid,paste(esums1$genus,esums1$species,sep=" "),esums$organism)

  }

Thanks in advance and I appreciate any help.

dwinter commented 5 years ago

Hi @anusurendra ,

The error message here is telling you that you are making too many requests to the NCBI. Unless you have an API key you can't make more than three per second, if you have a key you can make 10 per second. rentrez handles this rate-limiting for you, but you'll circumvent this if you use the functions in parallel like you have here.

Check out the vignette section on "API Keys" for details about you to register and use a key. Closing the issue for now, but feel free to ask more questions on this

anusurendra commented 5 years ago

@dwinter , Thank you for the prompt reply. I have added the api and am still getting the error.

below is my entire code:

library(rentrez)
library(doParallel)
library(foreach)

file_dir <- "/home"
nrlp.files <- list.files(file_dir, full.names=F, include.dirs=F)

cl <- makePSOCKcluster(8)
registerDoParallel(cl)

for( nrlp.file in nrlp.files[1]){

  nrpl.data.m <- as.data.frame(read.delim(paste(file_dir,nrlp.file,sep="/"),
                                     header=F ,as.is=T,quote="",sep="\t"),stringsAsFactors=F);
  colnames(nrpl.data.m) <- c("qacc","sacc","pident","length","mismatch","gapopen","qstart","qend","sstart","send","evalue","bitscore")
  #nrpl.data.m <- cbind(nrpl.data.m,NA,NA,NA)
  #colnames(nrpl.data.m)[13:15] <- c("taxaid","taxa","full_organism")

  nrpl.data.m2 <- foreach (i=1:dim(nrpl.data.m)[1],.packages = c("rentrez"), .combine=cbind) %dopar% {

    res <- entrez_search(db = "protein", term = paste(as.character(nrpl.data.m$sacc[i]),"[ACCN]",sep=""))
    esums <- entrez_summary(db = "protein", id = res$ids)

    res1 <- entrez_search(db = "taxonomy", term = paste(esums$taxid,"[uid]",sep=""))
    esums1 <- entrez_summary(db = "taxonomy", id = res1$ids)

    nrpl.data.m[i,c(13:15)] <- c(esums$taxid,paste(esums1$genus,esums1$species,sep=" "),esums$organism)

  }
  colnames(nrpl.data.m2)[13:15] <- c("taxaid","taxa","full_organism")
  write.table(nrpl.data.m2,file=paste(file_dir,"taxainfo",paste(nrlp.file,"taxaInfo",sep="_"),sep="/"),row.names=F,col.names=T,quote=F,sep="\t");
}

stopCluster(cl)
dwinter commented 5 years ago

Adding the API key only ups the limit to 10 requests per second, because this code is still running 8 requests in parallel it will still hi this limit.

If you have a good reason to need a lot of data quiickly, you can write an email to teh NCBI helpdesk (if it is staffed during the US government shutdown...) and ask that your key be given a higher rate-limit.

dwinter commented 5 years ago

(BTW, there are no great secrets being protected by your API key, but it is still something that is meant to be unique to your user account at the NCBI. You might want to remove it from public posts like this)

anusurendra commented 5 years ago

@dwinter,

Thanks for your help :).