tidyverse / googledrive

Google Drive R API
https://googledrive.tidyverse.org/
Other
322 stars 47 forks source link

`drive_download()` error with KML mimeType #441

Open caldwellst opened 1 year ago

caldwellst commented 1 year ago

There is a bug in drive_download() that causes it to fail on KML files depending on the mime type. KML mimeTypes are often text/xml or another raw text input. However, sometimes, depending on how they are loaded onto Drive, they are stored as application/vnd.google-earth.kml+xml.

Since drive_download() checks mimeType with grepl("google", mime_type), KML files are mistakenly assumed to be a directly support mimeType by Google Drive and error is generated from get_export_mime_type since it isn't recognised as a mime type.

#> Error in `get_export_mime_type()`:
#> ! Not a recognized Google MIME type:
#> ✖ application/vnd.google-earth.kml+xml
#> Run `rlang::last_trace()` to see where the error occurred.

I believe that this behavior is not ideal since downloading unsupported mime types should be okay, even if they aren't explicitly supported by the Google Drive API, since get_export_mime_type() is only checked if it appears to explicitly be a Google type. However, I may be wrong there.

Note that this code to check mime_type is also present in drive_read().

jennybc commented 1 year ago

Can you provide a link to a file that is problematic? I.e. a way to actually experience the problem.

caldwellst commented 1 year ago

Apologies, yes, here's a simple reprex of the issue (ignoring authorisation for googledrive). Uses this publicly available KML file stored on Google Drive. I also manually downloaded the file and uploaded to my Drive to highlight that the default mimeType often works no problem, it's just simply if the mimeType is set to application/vnd.google-earth.kml+xml that an issue is raised. I'm not an expert in why that might be, but I believe it might occur when the KML is programmatically added to a drive directly from Google Earth Engine, for instance.

library(googledrive)
library(tidyverse)
library(sf)

drive_download(
    as_id("12bPHu0w8gyEmoeblBAg25c8ch842THw7")
)
#> Error in `get_export_mime_type()`:
#> ! Not a recognized Google MIME type:
#> ✖ application/vnd.google-earth.kml+xml

# pedantic, but we can check the mimeType is what's specified in the error:

original_dribble <- drive_get( as_id("12bPHu0w8gyEmoeblBAg25c8ch842THw7"))

original_dribble %>%
    pull(drive_resource) %>%
    pluck(1) %>%
    pluck("mimeType")
#> [1] "application/vnd.google-earth.kml+xml"

# and we can successfully download the file if we manually adjust the type

original_dribble$drive_resource[[1]]$mimeType <- "text/xml"
drive_download(
    file = original_dribble,
    path = f <- tempfile(fileext = ".kml")
)
#> File downloaded:
#> • 13 Colonies Template.kml <id: 12bPHu0w8gyEmoeblBAg25c8ch842THw7>
#> Saved locally as:
#> • /var/folders/b7/_6hwb39d43l71kpy59b_clhr0000gn/T//RtmpxkJtgm/file36a721e09e40.kml

# and we can successfully read the file, no issues

bypassed_sf <- read_sf(f)
bypassed_sf
#> Simple feature collection with 14 features and 2 fields
#> Geometry type: POINT
#> Dimension:     XYZ
#> Bounding box:  xmin: -82.90712 ymin: 32.15744 xmax: -69.60236 ymax: 44.68772
#> z_range:       zmin: 0 zmax: 0
#> Geodetic CRS:  WGS 84
#> # A tibble: 14 × 3
#>    Name           Description                 geometry
#>    <chr>          <chr>                    <POINT [°]>
#>  1 Massachusetts  ""          Z (-69.60236 44.68772 0)
#>  2 Massachusetts  ""          Z (-71.38244 42.40721 0)
#>  3 Rhode Island   ""          Z (-71.47743 41.58009 0)
#>  4 Connecticut    ""          Z (-73.08775 41.60322 0)
#>  5 New Hampshire  ""           Z (-71.5724 43.19385 0)
#>  6 New York       ""          Z (-74.00597 40.71435 0)
#>  7 New Jersey     ""          Z (-74.42139 40.04444 0)
#>  8 Pennsylvania   ""          Z (-77.19452 41.20332 0)
#>  9 Delaware       ""          Z (-75.54199 39.18118 0)
#> 10 Maryland       ""          Z (-76.64127 39.04575 0)
#> 11 Virginia       ""          Z (-78.65689 37.43157 0)
#> 12 North Carolina ""           Z (-79.0193 35.75957 0)
#> 13 South Carolina ""          Z (-81.16372 33.83608 0)
#> 14 Georgia        ""          Z (-82.90712 32.15743 0)
#> Warning message:
#> In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
#>   automatically selected the first layer in a data source containing more than one.

# check with the same file that I manually downloaded and then uploaded to my drive

drive_download(
    file = as_id("1VNPS4fILxODi7wrlzurJtNgF1V8FrBva"),
    path = g <- tempfile(fileext = ".kml")
)
#> File downloaded:
#> • 13 Colonies Template.kml <id: 1VNPS4fILxODi7wrlzurJtNgF1V8FrBva>
#> Saved locally as:
#> • /var/folders/b7/_6hwb39d43l71kpy59b_clhr0000gn/T//RtmpxkJtgm/file36a72adf180c.kml

# here we see this has a mimeType of text/xml

drive_get( as_id("1VNPS4fILxODi7wrlzurJtNgF1V8FrBva")) %>%
    pull(drive_resource) %>%
    pluck(1) %>%
    pluck("mimeType")
#> [1] "text/xml"

# just confirming they are the same file

copied_sf <- read_sf(g)
all.equal(copied_sf, bypassed_sf)
#> TRUE
caldwellst commented 1 year ago

I had a look through the table of mime types at googledrive:::.drive$mime_tbl and noticing that some mimeTypes have what seems to be a "base type" added on at the end with +, much like the problematic KML mimeType above.

types <- googledrive:::.drive$mime_tbl$mime_type
types[grepl("\\+", types)]
#> [1] "application/epub+zip"                         
#> [2] "application/vnd.google-apps.script+json"      
#> [3] "application/vnd.google-apps.script+text/plain"
#> [4] "image/svg+xml"                                
#> [5] "image/svg+xml"   

Could a simple solution be, if there is a base mimeType specified, to exempt these from error generation? Again, stretching entirely my knowledge of mimeTypes but it seems that could be a robust and potentially future proof approach?

jon23cooper commented 1 year ago

I am having the same issue. I used drive_download to download a kml file, and then I uploaded the file using drive_upload. In my google folder when I look at the file information, the file type has changed from the original XML to Unknown File, and drive_download will no longer download it.