r-lib / archive

R bindings to libarchive, supporting a large variety of archive formats
https://archive.r-lib.org/
Other
142 stars 15 forks source link

Suggest adding new `archive_extract_filter` which uses `fs::path_filter` #89

Open orgadish opened 1 year ago

orgadish commented 1 year ago

I want to loop through many zip files, extracting just files of a specific type (e.g. csv). I have written a helper that combines archive::archive_extract with fs::path_filter which I think could be useful to others:

archive_extract_filter <- function(archive, dir = ".", glob = NULL, regexp = NULL, invert = FALSE, ...) {
  archive_contents <- archive::archive(archive)
  filtered_contents <- fs::path_filter(archive_contents$path, glob=glob, regexp=regexp, invert=invert)

  archive::archive_extract(
    archive = archive,
    dir = dir,
    files = filtered_contents,
    ...
  )
}
orgadish commented 1 year ago

Or perhaps even better create an archive_filter function that does the first part. Ideally, it would even return an archive object that it could be passed directly to archive_extract, but even if not it would simplify current usage e.g.:

archive_filter <- function(archive, glob = NULL, regexp = NULL, invert = FALSE, ...) {
  archive_contents <- archive::archive(archive, ...)
  return(fs::path_filter(archive_contents$path, glob=glob, regexp=regexp, invert=invert))
}

archive_extract(archive, files=archive_filter(archive))

# Ideally, you archive_filter would create an archive object that could be passed directly
# archive |> archive_filter(...) |> archive_extract() 
gaborcsardi commented 1 year ago

Thanks for the suggestion! I decided that I am not going to add this now, but it would be nice to mention it in the manual, and/or have an example along these lines in the manual. Would you like to submit a PR for that? (No pressure at all.)

orgadish commented 1 year ago

@gaborcsardi I wasn't sure if you meant to add it just for the archive_extract manual, or the README, but I added to both. If you intended something else, please let me know!