ropensci / chromer

package for interacting with the chromosome counts database (CCDB) at https://taux.evolseq.net/CCDB_web/
https://docs.ropensci.org/chromer
Other
12 stars 4 forks source link

Error generated with chrom_count() #24

Closed ledelaney closed 1 year ago

ledelaney commented 7 years ago

Hello!

I'm having an issue with the chrom_count function for some larger families (currently for Leguminosae and Compositae) and for all angiosperms.

I've tried some other families that function normally (e.g., Solanaceae, Cactaceae and Passifloraceae).

These are some attempts with their (slightly different) errors...

ang.chrom <- chrom_counts(taxa="Angiosperms", rank="majorGroup") gives the following error: Error in rbindlist(l) : Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'list'

leg.chrom <- chrom_counts(taxa = "Leguminosae", rank = "family") gives the following error: Error in rbindlist(l) : Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'character'

Some genera within Leguminosae work (e.g., Acacia), some do not: astrag.chrom <- chrom_counts(taxa = "Astragalus", rank = "genus") Error in rbindlist(l) : Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'character'

Finally, this yields the following (very similar) error: comp.chrom <- chrom_counts(taxa = "Compositae", rank = "family") Error in rbindlist(l) : Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'raw'

I tried using options(error=recover) -- when the error was generated, selecting the frame with rbindlist(l) caused R to crash.

Is there something I should be doing differently? Any help would be greatly appreciated! I downloaded this package directly from GitHub on 7/6/17 and I am running it on R version 3.4.0.

sckott commented 7 years ago

@mwpennell are you able to answer this? or should I?

mwpennell commented 7 years ago

sorry @sckott this person also directly emailed me so i responded there.

i honestly can't figure out how to solve this problem. it appears that the API is choking (somewhat stochastically) when too much data is requested. I would certainly appreciate your expertise here.

At a kludge, it is possible to work around this but making a bunch of smaller requests (for example, instead of requesting data for an entire family, one could ask for data on every genus in the family). I sent the following code to @ledelaney to do this (using taxonlookup):

devtools::install_github("ropensciliabs/datastorr")
devtools::install_github("traitecoevo/taxonlookup")
library(taxonlookup)
library(chromer)
library(dplyr)

## get a taxonomic table
tax <- plant_lookup()

## Wrapper function that loops through all genera in the group
chrom_counts_wrap <- function(taxa, rank, tax_tab){
  grp <- tax_tab$genus[which(tax_tab[,rank] == taxa)]
  res <- chrom_counts(grp[1], rank="genus")
  for (i in 2:length(grp)){
    tmp <- chrom_counts(grp[i], rank="genus")
    res <- bind_rows(res,tmp)
  }
  res
}

## As an example look for Solanaceae

## This fails (sometimes)
sol_dat <- chrom_counts("Solanaceae", rank="family")

## However, this should work
sol_dat <- chrom_counts_wrap("Solanaceae", rank="family", tax_tab=tax)

However, I recognize that this is far from ideal. It would be great if you could take a look at this. It is beyond my expertise here.

ledelaney commented 7 years ago

That work-around code mostly gets the job done, but the error is still generated for extremely large amounts of data -- like all angiosperms. (Figured this out today.) It tends to happen after a several thousand entries, but no clear pattern. We've just been writing to a .csv file and restarting where we get the error.

sckott commented 7 years ago

happy to help - though can someone share what the error or warning messages say

ledelaney commented 7 years ago

Error in rbindlist(l) : Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'list' -- sometimes "not a 'character'" or "not a 'raw'"

sckott commented 7 years ago

thanks @ledelaney will have a look

sckott commented 7 years ago

trying to replicate the issue - doesn't help when it's not failing 😞

ledelaney commented 7 years ago

Usually only happens after some time has elapsed when grabbing large datasets -- try Leguminosae, or all angiosperms. I've gotten it every time with those.

sckott commented 7 years ago

Right, was trying Angiosperms, will try again soon

sckott commented 7 years ago

@ledelaney @mwpennell

think i may have fixed it, install like devtools::install_github("ropensci/chromer@sckott-bindrows") and try again. if this fixes it, it was a problem with combination of rbindlist and tibbles not working together

sckott commented 7 years ago

bump @ledelaney @mwpennell

kbroman commented 1 year ago

I believe this is now fixed.