ropensci / neotoma

Programmatic R interface to the Neotoma Paleoecological Database.
https://docs.ropensci.org/neotoma
Other
30 stars 16 forks source link

get_publication doesn't retrieve all dataset id's #236

Closed schutzjordan closed 5 years ago

schutzjordan commented 5 years ago

Not getting dataset id's of datasets without publications; I've downloaded all of the pollen surface samples from the United States and Canada, and used this loop to try and sort sites that didn't have publications from sites that do.

#create list for sites with no publications
sites_no_pubs <- list()

#create list for sites with publications
sites_w_pubs <- list()

#Writing a FOR loop to sort sites that have pubs and sites that don't
for (i in 1:2804){ 
  currentsite <- sitesdownloadpubs[[i]]
  if (is.na(currentsite[[1]]$meta$id)){
    sites_no_pubs <- c(sites_no_pubs, currentsite)
  } 
  if(!is.na(currentsite[[1]]$meta$id)) {
    sites_w_pubs <- c(sites_w_pubs, currentsite)
  }
}
SimonGoring commented 5 years ago

Did you start this all with get_dataset(datasettype="pollen surface sample")?

I'm not sure where sitesdownloadpubs is coming from. Could you post the top of your code, where you generate the sitesdownloadpubs variable?

Using dplyr, neotoma and purrr I can create a data.frame with each dataset ID and the publications associated with that dataset:

library(neotoma)
library(dplyr)
library(purrr)

ssamp <- get_dataset(datasettype = "pollen surface sample",
                     gpid = c("Canada", "United States"))
sspub <- get_publication(ssamp)

assertthat::assert_that(length(ssamp) == length(sspub), msg = "There are missing publication objects.")

dsids <- (1:length(ssamp)) %>% 
  map(function(x) {
    data.frame(dsid = ssamp[[x]]$dataset.meta$dataset.id,
               map(sspub[[x]], function(y) y$meta) %>% bind_rows()) }) %>% 
  bind_rows()

This indicates that all pollen surface samples from Neotoma in the US & Canada have at least one publication associated with them.

SimonGoring commented 5 years ago

@schutzjordan any update? Did this work for you?

schutzjordan commented 5 years ago

@SimonGoring Hi Simon, sorry for such a late reply! The code you have above seemed to have worked; I'll post below what I had to begin with & how I got sitesdownloadpubs .

#Use argument datasettype to select pollen surface sample datasets only
CanadianSites <- neotoma::get_dataset(datasettype = "pollen surface sample", gpid = "Canada")
AmericanSites <- neotoma::get_dataset(datasettype = "pollen surface sample", gpid = "United States")

#Combine US and canadian sites into all_sites 
US_Can_Sites <- neotoma::bind(AmericanSites, CanadianSites)

#Assign name and download relevant information from combined sites
uscanpollen <- neotoma::get_download(US_Can_Sites)

#get publications for downloaded sites 
sitesdownloadpubs <- neotoma::get_publication(uscanpollen)
SimonGoring commented 5 years ago

Glad it worked. I'll close this issue for now. If there are issues feel free to re-open.