hurlbertlab / dietdatabase

Creative Commons Zero v1.0 Universal
10 stars 9 forks source link

QA/QC problem #82

Closed pwinner1 closed 6 years ago

pwinner1 commented 6 years ago

qa_qc(dietdb, fracsum_accuracy = 0.02)

Error in strsplit(diet$Prey_Part, ";") : non-character argument

  1. strsplit(diet$Prey_Part, ";")
  2. eval(expr, envir, enclos)
  3. eval(lhs, parent, parent)
  4. strsplit(diet$Prey_Part, ";") %>% unlist() %>% trimws() %>% table() %>% data.frame() %>% filter(!tolower(.) %in% c("bark", "bud", "dung", "egg", "feces", "flower", "fruit", "gall", "oogonium", "pollen", "root", "sap", "seed", "spore", "statoblasts", ... at database_error_checking.R#192
  5. qa_qc(dietdb, fracsum_accuracy = 0.02)
ahhurlbert commented 6 years ago

This was a problem for any of the fields in which values might be separated by a ";" which had to be strsplit(). This function fails when all values fed into strsplit are NA.

Fixed the qa_qc script so that it first checks whether the entire field being evaluated only consists of NAs. If so, the status for that field will be "only NAs". Otherwise it will proceed to do a check which will return either "OK", or a list of problematic entries.

For the Cooper paper, looks like @Dryu0003 needs to modify how he enters Family names (just use the name prior to the parentheses), and there may be an issue with Fraction diet values adding to something other than 1.

So paper's data still needs some checking but I'm closing the issue for now.