ropensci / taxize

A taxonomic toolbelt for R
https://docs.ropensci.org/taxize
Other
268 stars 61 forks source link

Wont work for a vector greater than 300 - gnr_resolve function #169

Closed Squiercg closed 11 years ago

Squiercg commented 11 years ago

Hello, first of all, congratulations on developing such a good package.

I was trying to use the function gnr_resolve on a vector with length = 735 and got the following msg:

lista.resolvida.sapos<-gnr_resolve(names=lista[],resolve_once= TRUE,data_source_ids="12") Erro em file(con, "r") (from gnr_resolve.R#57) : não é possível abrir a conexão

Which means cannot open a connection. i don't know if this is an issue with my network, i am using the university network.

But somehow, the function work perfect with a vector with less than 300 names, i managed to use the function breaking the species list into parts of 300 or lower names.

A made a copy of my list of species (a vector) in paste bin, here is the link: http://pastebin.com/raw.php?i=83BayWXK

Here is an example of the msg i receive, if anyone could check if this is a local problem of anything to do with the function...

teste<-source("http://pastebin.com/raw.php?i=83BayWXK") gnr_resolve(names=teste$value,resolve_once= TRUE,data_source_ids="12")

Erro em file(con, "r") (from gnr_resolve.R#57) : não é possível abrir a conexão

Another thought is that i have many repeated names, don't know if this is an issue to.

Hope i'm not taking too much of your time, if there is anything i can do to help just say :)

sckott commented 11 years ago

@Squiercg Thanks for the kind words!

My guess is that the URL string created with that many names is too long. There are limits on the length of a URL string. Here is some info on SO: http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers

We should be able to get around this by using a post request instead of a get request if allowed. I'll get back to you soon.

sckott commented 11 years ago

@Squiercg I made some progress, but the API call is giving me trouble with 500 names or more. I will try to get this fixed by tomorrow morning

Squiercg commented 11 years ago

If there is something i can help with, just say :)

2013/7/15 Scott Chamberlain notifications@github.com

@Squiercg https://github.com/Squiercg I made some progress, but the API call is giving me trouble with 500 names or more. I will try to get this fixed by tomorrow morning

— Reply to this email directly or view it on GitHubhttps://github.com/ropensci/taxize_/issues/169#issuecomment-21007474 .

Grato Augusto C. A. Ribas

Site Pessoal: http://recologia.com.br/ http://augustoribas.heliohost.org Github: https://github.com/Squiercg Lattes: http://lattes.cnpq.br/7355685961127056

sckott commented 11 years ago

Hi @Squiercg try it again after reinstalling the github version. Let me know if it works. You need to use http='post' in the function call for calls with greater than 499 names. This works now:

library(taxize); library(httr)
teste <- source("http://pastebin.com/raw.php?i=83BayWXK")
out <- gnr_resolve(names=teste$value, resolve_once= TRUE, data_source_ids=12, http="post")
nrow(out)

[1] 745
head(out)

          submitted_name           matched_name data_source_title score
1    Allobates femoralis    Allobates femoralis               EOL 0.988
2 Allobates marchesianus Allobates marchesianus               EOL 0.988
3 Allobates marchesianus Allobates marchesianus               EOL 0.988
4       Ameerega parvula       Ameerega parvula               EOL 0.988
5         Ameerega picta         Ameerega picta               EOL 0.988
6         Ameerega picta         Ameerega picta               EOL 0.988
sckott commented 11 years ago

@Squiercg Though you can submit duplicate names to the function, it would be faster for you to get the unique set of names, then match the output to your original name list