mczyzj / pestr

Tools to use EPPO Data Services from R
Other
6 stars 0 forks source link

eppo_tabletools_distri subscript out of bounds error #26

Open katier239 opened 3 weeks ago

katier239 commented 3 weeks ago

Hello,

Thanks for the cool package :)

I've installed the most recent version (0.8.2) and am encountering an error when trying to retrieve pest distributions. I was able to retrieve other metadata (e.g. taxonomy, hosts etc) without problems, but when I try to get distributions, the following error arises:

> pests <- c("Anastrepha ludens", "Drosophila suzukii")
> pests_names_tables <- eppo_names_tables(pests, eppo_SQLite)
> pest_distri <- eppo_tabletools_distri(pests_names_tables, eppo_token)
New names:                                                                                                                 
• `` -> `...6`
• `` -> `...7`
• `` -> `...8`
• `` -> `...9`
• `` -> `...10`
• `` -> `...11`
• `` -> `...12`
• `` -> `...13`
• `` -> `...14`
• `` -> `...15`
• `` -> `...16`
• `` -> `...17`
• `` -> `...18`
• `` -> `...19`
• `` -> `...20`
• `` -> `...21`
• `` -> `...22`
• `` -> `...23`
• `` -> `...24`
• `` -> `...25`
New names:                                                                                                                 
• `` -> `...6`
• `` -> `...7`
• `` -> `...8`
• `` -> `...9`
• `` -> `...10`
• `` -> `...11`
• `` -> `...12`
• `` -> `...13`
• `` -> `...14`
• `` -> `...15`
• `` -> `...16`
• `` -> `...17`
• `` -> `...18`
• `` -> `...19`
• `` -> `...20`
• `` -> `...21`
• `` -> `...22`
• `` -> `...23`
• `` -> `...24`
• `` -> `...25`
The distribution file for EPPO code ANSTLU was not found.
The distribution file for EPPO code DROSSU was not found.
Error in distri_lists[[i]] : subscript out of bounds

It appears that the eppocodes and URLs are being generated as expected:

Browse[1]> eppocodes
[1] "ANSTLU" "DROSSU"
Browse[1]> distri_urls
[1] "https://gd.eppo.int/taxon/ANSTLU/download/distribution_csv" "https://gd.eppo.int/taxon/DROSSU/download/distribution_csv"
Browse[1]> names_tables$exist_in_DB
  codeid           fullname
1   4669  Anastrepha ludens
2   9518 Drosophila suzukii

When I do a manual search on the EPPO website, the data is there. When I paste the above URLs into a browser window, the files download and contain data. However, the column format of the EPPO file appears different to the expected by pestr - in addition to the columns "continent", "country", "state","country code", "state code", "Status", there are now also several other unnamed columns.

I think this is causing eppo_csv_download to throw an error at this point:

if (!all(names(distri_lists[[i]]) %in%
             c("continent", "country", "state",
               "country code", "state code", "Status"))) {
      message(msg_helper("no_distri", i))
      distri_lists[[i]] <- NULL

When I run the preceding code in debug mode it seems to work ok:

Browse[1]> for (i in 1:length(distri_lists)) {
+     distri_lists[[i]] <- eppo_try_urls(distri_urls[i]) %>%
+         httr::content(type = "text/csv",
+                       encoding = "UTF-8",
+                       col_types = readr::cols()) %>%
+         as.data.frame()
+ }
New names:                                                                                                                 
• `` -> `...6`
• `` -> `...7`
• `` -> `...8`
• `` -> `...9`
• `` -> `...10`
• `` -> `...11`
• `` -> `...12`
• `` -> `...13`
• `` -> `...14`
• `` -> `...15`
• `` -> `...16`
• `` -> `...17`
• `` -> `...18`
• `` -> `...19`
• `` -> `...20`
• `` -> `...21`
• `` -> `...22`
• `` -> `...23`
• `` -> `...24`
• `` -> `...25`
New names:                                                                                                                 
• `` -> `...6`
• `` -> `...7`
• `` -> `...8`
• `` -> `...9`
• `` -> `...10`
• `` -> `...11`
• `` -> `...12`
• `` -> `...13`
• `` -> `...14`
• `` -> `...15`
• `` -> `...16`
• `` -> `...17`
• `` -> `...18`
• `` -> `...19`
• `` -> `...20`
• `` -> `...21`
• `` -> `...22`
• `` -> `...23`
• `` -> `...24`
• `` -> `...25`
Browse[1]> distri_lists[["DROSSU"]][1,]
  country   state country code state code Status                             ...6 ...7 ...8 ...9 ...10 ...11 ...12 ...13
1  Africa Algeria         <NA>         DZ   <NA> Present, restricted distribution   NA   NA   NA    NA    NA    NA    NA
  ...14 ...15 ...16 ...17 ...18 ...19 ...20 ...21 ...22 ...23 ...24 ...25 continent
1    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA        NA

Lastly, I noticed that EPPO seems to have scrambled the column order - 'continent' is the last column name, but the continent data is in the first column.

Thanks!

mczyzj commented 3 weeks ago

Hey @katier239 ,

Thanks for your interest in the package :) Some time ago EPPO changed the structure of the csv files. I already adjusted the code but didn't send it to CRAN for update. So for the time being if you could just install the github version I would be greatful. Probably I will send CRAN update within next two weeks, so most likely the official version will be available before October.

If there is anything else, please do not hesitate to report.

All the best, Michal