ropensci / rentrez

talk with NCBI entrez using R
https://docs.ropensci.org/rentrez
Other
195 stars 38 forks source link

Error in vapply(elements, encode, character(1)) #109

Closed ghost closed 7 years ago

ghost commented 7 years ago

Hi,

I wrote a function as in the tutorial, and then passed a list of genera, so that I can get ids for each geneus in a specific db.

upload <- entrez_post(db="nucleotide", id=12345) search_genus <- function(genera){ query <- paste(genera, " [ORGN]") entrez_search(db="pubmed", term=query, retmax=0, web_history=upload)$ids } sapply(genera, search_genus, USE.NAMES=FALSE)

Then I got error:

Error in vapply(elements, encode, character(1)) : values must be length 1, but FUN(X[[6]]) result is length 2

Then tried something else:

search_genus <- function(genera){

  • query <- paste(genera, " [ORGN]")
  • entrez_search(db="pubmed", term=query, retmax=0, use_history=TRUE)$ids
  • } sapply(genera, search_genus, USE.NAMES=FALSE) [[1]] list()

Any hints is appreciated!

Miao

dwinter commented 7 years ago

Hi Miao,

Not sure you need the upload here? use_history is an argument for entrez_search that creates a web history on the NCBI server and returns a reference to that history. Some functions (like etnrez_fetch) have a web_history argument to pass that history in place of a unique ID.

(This error arises because the search function will try to pass arbitrary arguments on the NCBI, but here the argument is a list and the machinery for makeing requests doesn't now how to deal with that).

ghost commented 7 years ago

@dwinter Thanks for you quick responding! My situation is genera names I only know, and I think I'd better know ids, and go from there I have tried only one, but no id:

entrez_search(db="sra", term="Catalpa [ORGN]",retmax=0) Entrez search result with 1 hits (object contains 0 IDs and no web_history object) Search term (as translated): "Catalpa"[Organism]

I have met another issue is that:

If use "db=genome", search "Magnolia", I got 0 count, but I got web, I got a lot more, so where is my query wrong, or how to get a summary report for how many genome sequences out there for a specific genus?

search_genus_G <- function(genera){

  • query <- paste(genera, " [ORGN]", "OR ALL")
  • entrez_search(db="genome", term=query, retmax=0)$count
  • } Gg<- sapply(genera, search_genus_G, USE.NAMES=FALSE) Records <- cbind(genera, Gg) Records genera Gg
    [1,] "Acer" "0" [2,] "Aesculus" "0" [3,] "Carpinus" "0" [4,] "Cartrema" "0" [5,] "Carya" "0" ...... [26,] "Liriodendron" "0" [27,] "Lyonia" "0" [28,] "Magnolia" "0" [29,] "Nyssa" "0"

Please help me.

Thank you!

dwinter commented 7 years ago

Hi Miao,

I can't give too much general advice on what terms are available in which NCBI database. Using the web interface is often the best place to start (btw, I don't see any records for Magnolia in the genome database, but get a note about mitochondrial and chloroplast sequences).

You may have better luck finding the taxonomy ID for each genus that you are interested in, then using entrez link to find records in linked databases?

ghost commented 7 years ago

I mean when you search "Magnolia" and "genome" on the webpage (https://www.ncbi.nlm.nih.gov/nuccore), there are already 20 records for chloroplast complete genomic data on the first page. Looks like a bug.

dwinter commented 7 years ago

OK, well if you think the chloroplast genomes should be coming up in the a search of against genome then you might want to contact the NCBI. There is nothing rentrez can do to change what the NCBI returns, so I will close this issue now.

ghost commented 7 years ago

Why not? chloroplast genomes is not in "genome" concept?

It seems that my knowledge doesn't make any sense here!

On Jun 30, 2017, at 6:49 PM, David Winter notifications@github.com wrote:

chloroplast genomes

dwinter commented 7 years ago

Hi Miao,

rentrez is a package for communicating with the NCBI's databases using the API they make available for that purpose. We will happily try and fix ant bugs relating to the way that data is sent, recieved and presented with R. But we can't change how the databases themselves behave.

If you think this is a bug you should take it up with the NCBI.

ghost commented 7 years ago

Thanks for explaining this further for me!

Miao

On Jun 30, 2017, at 21:01, David Winter <notifications@github.com mailto:notifications@github.com> wrote:

Hi Miao,

rentrez is a package for communicating with the NCBI's databases using the API they make available for that purpose. We will happily try and fix ant bugs relating to the way that data is sent, recieved and presented with R. But we can't change how the databases themselves behave.

If you think this is a bug you should take it up with the NCBI.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ropensci/rentrez/issues/109#issuecomment-312401095, or mute the thread https://github.com/notifications/unsubscribe-auth/AUT0uMpIl6z50TEtXAQuXOF2TR4eDhAUks5sJZp-gaJpZM4OKFwD.