r-lib / archive

R bindings to libarchive, supporting a large variety of archive formats
https://archive.r-lib.org/
Other
141 stars 15 forks source link

archive_write_files fails on compressing geopackage files into 7zip #97

Open jldupouey opened 6 months ago

jldupouey commented 6 months ago

The following code shows that the archive_write_files function in the archivepackage incorrectly compresses geopackagefiles into 7zip.

I've tried this with several different geopackagefiles, and the error is the same. The problem is with compression, not decompression.

Is there an option to set in the archive_write_files function call for this type of file? Or is it a bug in archive?

    # R version 4.3.2 (2023-10-31 ucrt)
    # Platform: x86_64-w64-mingw32/x64 (64-bit)
    # Running under: Windows 10 x64 (build 19045)

    # archive_1.1.7    
    library(archive)

    # sf_1.0-15 
    library(sf)

    nc <- st_read(system.file("shape/nc.shp", package="sf"))

    # Reading layer `nc' from data source `C:\Users\jldupouey\AppData\Local\R\win-library\4.3\sf\shape\nc.shp' using driver `ESRI Shapefile'
    # Simple feature collection with 100 features and 14 fields
    # Geometry type: MULTIPOLYGON
    # Dimension:     XY
    # Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
    # Geodetic CRS:  NAD27

    # writing the geopackage file:

    st_write(nc,"nc.gpkg",append=FALSE)

    # Writing layer `nc' to data source `nc.gpkg' using driver `GPKG'
    # Writing 100 features with 14 fields and geometry type Multi Polygon.

    # the file has been correctly written:

    st_read("nc.gpkg")

    # Reading layer `nc' from data source `D:\a_jeter\nc.gpkg' using driver `GPKG'
    # Simple feature collection with 100 features and 14 fields
    # Geometry type: MULTIPOLYGON
    # Dimension:     XY
    # Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
    # Geodetic CRS:  NAD27

    # creating a 7z archive:

    archive_write_files(archive="nc.7z",files="nc.gpkg",format="7zip")

    # extracting the geopackage file from the 7z archive:

    archive_extract(archive="nc.7z",files="nc.gpkg")

    # the extracted file is not correct:

    st_read("nc.gpkg")

    # Error: Cannot open "D:\a_jeter\nc.gpkg"; The source could be corrupt or not supported. See `st_drivers()` for a list of supported formats.
    # In addition: Warning messages:
    # 1: In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
    #   GDAL Error 1: database disk image is malformed
    # 2: In CPL_read_ogr(dsn, layer, query, as.character(options), quiet,  :
    #   GDAL Error 1: sqlite3_prepare_v2(SELECT COUNT(*) FROM sqlite_master WHERE name IN ('gpkg_metadata', 'gpkg_metadata_reference') AND type IN ('table', 'view')) failed: database disk image is malformed
cielavenir commented 3 months ago

Hi @jldupouey could you check if https://github.com/r-lib/archive/pull/80 (or #99) fixes the issue?

cielavenir commented 3 months ago

alternatively you can use non-Windows platforms including WSL

rogiersbart commented 5 hours ago

Hi @jldupouey and @cielavenir,

I can easily reproduce this with the reprex below. For me, it seems that both #80 and #99 fix this. If I understand well, it is "R-CMD-check / ubuntu-latest (release) (pull_request)" in GHA CI for #99 that is preventing this from being merged at the moment?

library(archive)
saveRDS(cars, "cars.rds")
archive_write_files("cars.7z", "cars.rds")
archive_extract("cars.7z")
readRDS("cars.rds") |> head()
#> Error in readRDS("cars.rds"): ReadItem: unknown type 0, perhaps written by later version of R

Created on 2024-07-03 with reprex v2.1.0

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.4.1 (2024-06-14 ucrt) #> os Windows 10 x64 (build 19045) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate Dutch_Belgium.utf8 #> ctype Dutch_Belgium.utf8 #> tz Europe/Brussels #> date 2024-07-03 #> pandoc 3.1.11 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> ! package * version date (UTC) lib source #> D archive * 1.1.8 2024-04-28 [1] RSPM #> cli 3.6.3 2024-06-21 [1] RSPM #> digest 0.6.36 2024-06-23 [1] RSPM #> evaluate 0.24.0 2024-06-10 [1] RSPM #> fansi 1.0.6 2023-12-08 [1] RSPM (R 4.4.0) #> fastmap 1.2.0 2024-05-15 [1] RSPM #> fs 1.6.4 2024-04-25 [1] RSPM (R 4.4.0) #> glue 1.7.0 2024-01-09 [1] RSPM #> htmltools 0.5.8.1 2024-04-04 [1] RSPM (R 4.4.0) #> knitr 1.47 2024-05-29 [1] RSPM #> lifecycle 1.0.4 2023-11-07 [1] RSPM (R 4.4.0) #> magrittr 2.0.3 2022-03-30 [1] RSPM (R 4.4.0) #> pillar 1.9.0 2023-03-22 [1] RSPM (R 4.4.0) #> pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.4.0) #> purrr 1.0.2 2023-08-10 [1] RSPM (R 4.4.0) #> R.cache 0.16.0 2022-07-21 [1] RSPM #> R.methodsS3 1.8.2 2022-06-13 [1] RSPM #> R.oo 1.26.0 2024-01-24 [1] RSPM #> R.utils 2.12.3 2023-11-18 [1] RSPM #> reprex 2.1.0 2024-01-11 [1] RSPM (R 4.4.0) #> rlang 1.1.4 2024-06-04 [1] RSPM #> rmarkdown 2.27 2024-05-17 [1] RSPM #> rstudioapi 0.16.0 2024-03-24 [1] RSPM (R 4.4.0) #> sessioninfo 1.2.2 2021-12-06 [1] RSPM #> styler 1.10.3 2024-04-07 [1] RSPM #> tibble 3.2.1 2023-03-20 [1] RSPM (R 4.4.0) #> utf8 1.2.4 2023-10-22 [1] RSPM (R 4.4.0) #> vctrs 0.6.5 2023-12-01 [1] RSPM (R 4.4.0) #> withr 3.0.0 2024-01-16 [1] RSPM (R 4.4.0) #> xfun 0.45 2024-06-16 [1] RSPM #> yaml 2.3.8 2023-12-11 [1] RSPM (R 4.4.0) #> #> [1] C:/Users/brogiers/AppData/Local/R/win-library/4.4 #> [2] C:/Program Files/R/R-4.4.1/library #> #> D ── DLL MD5 mismatch, broken installation. #> #> ────────────────────────────────────────────────────────────────────────────── ```
cielavenir commented 3 hours ago

@rogiersbart sorry, i have not touched O_BINARY issue after I posted https://github.com/r-lib/archive/pull/73#issuecomment-1875688520 as they dont accept my attempt to unify coding styles spanning around multiple files.

rogiersbart commented 3 hours ago

Ok, I see, thanks for the feedback.