iDigBio / ridigbio

ridigbio -- an R interface to iDigBio's API (see http://www.idigbio.org/)
http://idigbio.github.io/ridigbio/
Other
16 stars 10 forks source link

Switch from plyr::rbind.fill to dplyr::bind_rows #35

Closed adamdsmith closed 7 years ago

adamdsmith commented 7 years ago

I consistently receive the following warning message when running idig_search_records:

Warning message:
    In column[rows] <<- what :
    number of items to replace is not a multiple of replacement length

It seems to be related to the use of plyr::rbind.fill in the idig_search function. Switching to its more current counterpart, dplyr::bind_rows seemingly solves the problem. I'm not sure if/how the switch to dplyr::bind_rows influences the geopoint massaging that occurs in the fmt_search_txt_to_df function.

MRE:

rq <- list(geopoint = list(type = "geo_bounding_box",
                           top_left = list(lat = 31.53, lon = -81.21),
                           bottom_right = list(lat = 31.45, lon = -81.19)))
idb_recs <- ridigbio::idig_search_records(rq = rq, fields = "all")
adamdsmith commented 7 years ago

May want to hold off on this... After some initial success, my R session now periodically crashes when running dplyr::bind_rows in place of plyr::rbind.fill.

Here's a query that consistently crashes...

library(ridigbio)
lon_range <- c(-91.50, -91.45); lat_range = c(30.73493, 30.83056) 

# Geographic query
rq <- list(geopoint = list(type = "geo_bounding_box",
                           top_left = list(lat = lat_range[2], lon = lon_range[1]),
                           bottom_right = list(lat = lat_range[1], lon = lon_range[2])))

# Specify values for required arguments and set up query
fields = "all"; max_items = 1e+05; limit = 0; offset = 0; 
sort = FALSE; type = "records"; mq = FALSE
query <- list(offset = offset)
query[["sort"]] <- c("uuid")
query$rq = rq
field_lists <- ridigbio:::build_field_lists(fields, type)
fields <- field_lists$fields
query <- append(query, field_lists$query)
query$limit <- max_items

# Set up empty (receiving) data.frame
m <- matrix(nrow = 0, ncol = length(fields))
dat <- data.frame(m, stringsAsFactors = FALSE)
colnames(dat) <- fields

search_results <- ridigbio:::idig_POST(paste0("search/", type), 
                                       body = query)

# Search results data.frame
foo <- ridigbio:::fmt_search_txt_to_df(search_results, fields)

# Works fine with plyr::rbind.fill (except for the warning)
dat_plyr <- plyr::rbind.fill(dat, foo)

# Crashes R with dplyr::bind_rows
dat_dplyr <- dplyr::bind_rows(dat, foo)

R version 3.3.2 (2016-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 plyr: 1.8.4 dplyr: 0.5.0

fmichonneau commented 7 years ago

thanks for the report @adamdsmith . I think I know the origin of the problem, and I'll look into it tomorrow.

mjcollin commented 7 years ago

I apologize for the very long delay in looking at this issue. Please see the commit that I just pushed which excludes the indexData field from the results which should address the issue. You can install the master branch of this package directly from github with

library(devtools) install_github("idigbio/ridigbio")

Please let me know how this works for you.

I think @fmichonneau 's update to dplyr::bind_rows is a more robust change to the method of building the df so I'll merge that PR (when it's working) in favor of this one unless you have objections.

adamdsmith commented 7 years ago

No worries! Thanks for returning to it and thanks @fmichonneau for the (soon to be) fix.

mjcollin commented 7 years ago

Closing due to alternate fix.