jenmark commented 9 years ago

Dear rOpenSci,

I'm using R to download occurrence records for multiple species at once (from csv) from GBIF with package dismo. I have some modified code for this, but am consistently getting the error "invalid request".

This code worked 6 months ago, but now it returns the error - I don't know if something has changed with dismo, GBIF or if I'm making an error, but I'm using the most up to date dismo version. Could it be to do with the gbif API dismo uses?

My apologies if I should have posted this elsewhere - I'm new to GitHub and rOpenSci.

Please can anyone help? Many thanks in advance, Jennifer

My code:

install.packages ("mgcv")
install.packages ("kernlab")
install.packages ("maps")
install.packages ("mapdata")
install.packages ("partykit")
install.packages ("rJava")
install.packages ("RODBC")

library(dismo)
library(rgdal)
library(rpart)
library(mgcv)
library(kernlab)
library(maps)
library(mapdata)
library(partykit)
library(rJava)
library(raster)
library(RODBC)

Read csv file of accepted Genus and species names (2 columns) - call it 'd':

d <- read.csv("Accepted_2.csv")

Read the column headings ('New.Genus' and 'New.Species') as characters, rather than names to extract from GBIF:

d$New.Genus<-as.character(d$New.Genus)
d$New.Species<-as.character(d$New.Species)
attach(d)

Try extracting the first species on the csv file list, from GBIF online database:

d1<-try(gbif(d$New.Genus [1],d$New.Species[1],geo=T,sp=F))

Returns error “invalid request”:

Loading required package: XML http://data.gbif.org/ws/rest/occurrence/count?scientificname=Alexa+imperatricis&coordinatestatus=true Error in gbif(d$New.Genus[1], d$New.Species[1], geo = T, sp = F) : invalid request

Rest of code:

Repeat this process for all species in the csv file. Write the extracted x/y co-ordinate data in new csv file, named "Accepted2_output.csv":

for (i in 49:length(d)){
d2<-gbif(d$New.Genus[i],d$New.Species[i],geo=T,sp=F)
d3<-rbind(d1,d2)
write.csv(d3, "Accepted_2_output.csv")}

sckott commented 9 years ago

hi @jenmark - dismo uses an old API that GBIF no longer supports - if you go to data.gbif.org you see

data.gbif.org has now been decommissioned. Please use www.gbif.org for all data access

The spocc package uses rgbif to get at GBIF data. rgbif uses the new GBIF APIs. You can get GBIF data from spocc or rgbif. In spocc, see the function occ(), and in rgbif, you most likely want occ_search(), but you may want to use rgbif to look up each name first, using name_backbone() to get an ID for each name, then pass that ID to occ_search, like

key <- name_backbone(name='Helianthus annuus', kingdom='plants')$speciesKey
occ_search(taxonKey=key, limit=20)
#> returns data...

rgbif tutorial http://ropensci.org/tutorials/rgbif_tutorial.html
spocc tutorial http://ropensci.org/tutorials/spocc_tutorial.html

does that help?

jenmark commented 9 years ago

Hi @sckott ,

Thank you very much, that explains things! I'll use rgbif instead - thanks for the tutorials link and examples.

Much appreciated, Jennifer

sckott commented 9 years ago

hi again, saw a few more questions in your email:

About invalid integer range - There was a similar question recently - here was that discussion https://github.com/ropensci/rgbif/issues/143 Basically, when you get the result from name_backbone() you may not have all taxonomic keys in that list - you may have some NULL values or other things. Check that list to make sure. You can also simply search by name in occ_search() - see first example in https://github.com/ropensci/rgbif#search-for-occurrence-data

I don't know of any way to make the selection random. I imagine you want a random selection within each species? I guess if I wanted 50 random records, I might get e.g., 1000 occurrence record keys for each species, then select 50 at random from that list of 1000, then retrieve those records. e.g.

out <- occ_search(scientificName = 'Ursus americanus', fields="key", limit=1000)
keys <- sample(out$data$key, 50) # get random set of 50
occ_get(keys, fields = "all") # pass those keys to occ_get, set fields to all to get all fields

is relatively quick.

jenmark commented 9 years ago

Hi Scott,

Thank you, that's a big help - retrieving 1000 and then randomly selecting 50 of them (for example) is a good solution! I'll try it.

A colleague has just helped me get the records without the "invalid request" error, but thanks for explaining further - it's useful to know what was going wrong.

I hope you don't mind one final quick question - I'm trying to save the output results in csv file, and found an example here: http://stackoverflow.com/questions/27089159/how-do-i-save-individual-species-data-downloaded-via-rgbif

But that is for a list and not the full output as below.

Please could you advise me how to save the output? I need the columns: name; key; decimal.latitude; decimal.longitude; locality; countryCode; country; institutionCode.

Thanks in advance - any advise is really appreciated!

?

What we have done is:

minitest<-read.csv("C:/Users/Jenny/Documents/R/minitest_rgbif.csv",stringsAsFactors=FALSE) head(minitest) minitest2<-as.vector(minitest$Accepted.Binomial) keys2 <- sapply(minitest2, function(x) name_backbone(name=x)$speciesKey, USE.NAMES=FALSE) search_res<-occ_search(taxonKey=keys2, limit=2,hasCoordinate=TRUE)

this doesn't work

So I changed the keys from a list to a vector

keys3<-unlist(keys2) search_res<-occ_search(taxonKey=keys3, limit=2,hasCoordinate=TRUE) search_res

Try saving search_res output as csv using stackoverflow example

filenames <- paste(sapply(sapply(search_res, FUN = "[[", "name", simplify = FALSE), unique), ".txt", sep = "") mapply(search_res, filenames, FUN = function(x, y) write.table(x, file = y, row.names = FALSE))

Error in data.frame(meta = list(offset = 0L, limit = 2L, endOfRecords = FALSE, : arguments imply differing number of rows: 1, 7, 2, 0

error is because data output isn't a list?

str(search_res)

yes output combines lists and dataframes

From: Scott Chamberlain notifications@github.com Sent: Tuesday, February 24, 2015 4:15 PM To: ropensci/spocc Cc: Jennifer Mark Subject: Re: [spocc] dismo error "invalid request" when retrieving gbif occurrence data - please help! (#113)

hi again, saw a few more questions in your email:

About invalid integer range - There was a similar question recently - here was that discussion ropensci/rgbif#143https://github.com/ropensci/rgbif/issues/143 Basically, when you get the result from name_backbone() you may not have all taxonomic keys in that list - you may have some NULL values or other things. Check that list to make sure. You can also simply search by name in occ_search() - see first example in https://github.com/ropensci/rgbif#search-for-occurrence-data

I don't know of any way to make the selection random. I imagine you want a random selection within each species? I guess if I wanted 50 random records, I might get e.g., 1000 occurrence record keys for each species, then select 50 at random from that list of 1000, then retrieve those records. e.g.

out <- occ_search(scientificName = 'Ursus americanus', fields="key", limit=1000) keys <- sample(out$data$key, 50) # get random set of 50 occ_get(keys, fields = "all") # pass those keys to occ_get, set fields to all to get all fields

is relatively quick.

Reply to this email directly or view it on GitHubhttps://github.com/ropensci/spocc/issues/113#issuecomment-75787214.

BU is a Disability Two Ticks Employer and has signed up to the Mindful Employer charter. Information about the accessibility of University buildings can be found on the BU DisabledGo webpages This email is intended only for the person to whom it is addressed and may contain confidential information. If you have received this email in error, please notify the sender and delete this email, which must not be copied, distributed or disclosed to any other person. Any views or opinions presented are solely those of the author and do not necessarily represent those of Bournemouth University or its subsidiary companies. Nor can any contract be formed on behalf of the University or its subsidiary companies via email.

sckott commented 9 years ago

hi, for fields to save, you can either request those in the occ_search() request like

occ_search(scientificName = "Ursus americanus", limit = 5, fields=c("name", "key", "decimalLatitude", "decimalLongitude", "locality", "countryCode", "country", "institutionCode"))
#> Records found [6857] 
#> Records returned [5] 
#> No. unique hierarchies [1] 
#> No. media records [5] 
#> Args [scientificName=Ursus americanus, limit=5, offset=0,
#>      fields=name,key,decimalLatitude,decimalLongitude,locality,countryCode,country,institutionCode] 
#> First 10 rows of data
#> 
#>               name        key decimalLongitude decimalLatitude countryCode       country institutionCode
#> 1 Ursus americanus  891034709       -103.29468        29.23322          US United States     iNaturalist
#> 2 Ursus americanus 1024328693       -118.14681        34.20990          US United States     iNaturalist
#> 3 Ursus americanus  891045574        -72.52534        43.73511          US United States     iNaturalist
#> 4 Ursus americanus  891041363       -103.28908        29.28284          US United States     iNaturalist
#> 5 Ursus americanus 1050834838       -107.70675        33.11070          US United States             MSB
#> Variables not shown: locality (chr)

sckott commented 9 years ago

for saving the data, I'm not sure exactly what's going wrong there, I would suggest trying the for loop solution in that Stackoverflow answer, it should be easier to understand what's going wrong there...

jenmark commented 9 years ago

?Thanks very much, Scott

I'll try these out. Thanks for all your help!

All the best,

Jennifer

From: Scott Chamberlain notifications@github.com Sent: Tuesday, February 24, 2015 5:40 PM To: ropensci/spocc Cc: Jennifer Mark Subject: Re: [spocc] dismo error "invalid request" when retrieving gbif occurrence data - please help! (#113)

for saving the data, I'm not sure exactly what's going wrong there, I would suggest trying the for loop solution in that Stackoverflow answer, it should be easier to understand what's going wrong there...

Reply to this email directly or view it on GitHubhttps://github.com/ropensci/spocc/issues/113#issuecomment-75805079.

BU is a Disability Two Ticks Employer and has signed up to the Mindful Employer charter. Information about the accessibility of University buildings can be found on the BU DisabledGo webpages This email is intended only for the person to whom it is addressed and may contain confidential information. If you have received this email in error, please notify the sender and delete this email, which must not be copied, distributed or disclosed to any other person. Any views or opinions presented are solely those of the author and do not necessarily represent those of Bournemouth University or its subsidiary companies. Nor can any contract be formed on behalf of the University or its subsidiary companies via email.

ropensci / spocc

dismo error "invalid request" when retrieving gbif occurrence data - please help! #113

Read csv file of accepted Genus and species names (2 columns) - call it 'd':

Read the column headings ('New.Genus' and 'New.Species') as characters, rather than names to extract from GBIF:

Try extracting the first species on the csv file list, from GBIF online database:

Rest of code:

Repeat this process for all species in the csv file. Write the extracted x/y co-ordinate data in new csv file, named "Accepted2_output.csv":

this doesn't work

So I changed the keys from a list to a vector

Try saving search_res output as csv using stackoverflow example

error is because data output isn't a list?

yes output combines lists and dataframes