wcjochem / sfarrow

R package for reading/writing `sf` objects from/to parquet files with `arrow`.
https://wcjochem.github.io/sfarrow/
Other
75 stars 4 forks source link

Clean up datasets written in examples? #11

Closed jonkeane closed 2 years ago

jonkeane commented 2 years ago

When running the reverse dependency checks for {arrow} I noticed that it looks like the temporary datasets written aren't cleaned up, which leads to a spurious failure when running the checks twice in a row (like the way that {revdepcheck} runs them).

The error I'm seeing is:

Running examples in ‘sfarrow-Ex.R’ failed
The error most likely occurred in:

> ### Name: write_sf_dataset
> ### Title: Write 'sf' object to an Arrow multi-file dataset
> ### Aliases: write_sf_dataset
> 
> ### ** Examples
> 
> # read spatial object
> nc <- sf::st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)
> 
> # create random grouping
> nc$group <- sample(1:3, nrow(nc), replace = TRUE)
> 
> # use dplyr to group the dataset. %>% also allowed
> nc_g <- dplyr::group_by(nc, group)
> 
> # write out to parquet datasets
> # partitioning determined by dplyr 'group_vars'
> write_sf_dataset(nc_g, path = file.path(tempdir(), "ds"))
Warning: This is an initial implementation of Parquet/Feather file support and
geo metadata. This is tracking version 0.1.0 of the metadata
(https://github.com/geopandas/geo-arrow-spec). This metadata
specification may change and does not yet make stability promises.  We
do not yet recommend using this in a production setting unless you are
able to rewrite your Parquet/Feather files.
Error: Invalid: Could not write to /tmp/RtmpBqMSFd/ds as the directory is not empty and existing_data_behavior is to error
Execution halted

It looks like the dataset from the first check run is still in the directory returned by tempdir() when the second check run encounters it. I know this is not a common use (and one that end users would never encounter).

Would you be open to a PR that either creates a unique temp directory using tempfile() for these similar to how we do it in {arrow}, or add on a cleanup step to the end of these examples so that the files created in tempdir() are cleaned up for subsequent runs? I'm happy to submit one if you're interested!

wcjochem commented 2 years ago

Hi @jonkeane, thanks for spotting this! I didn't realise this would be an issue for checks, so this is helpful for me to know. I'll start a PR, just so I understand (as I'm a bit new at maintaining a package). Perhaps you could take a look at it?

jonkeane commented 2 years ago

I would be happy to take a look when you have a PR ready. And to make sure you're not worried: this oddity/error only happens when someone runs check twice in a row in the same R session (which is a very weird thing for basically anyone to do! It just so happens to be how {revdepcheck} does it tho). So I don't think this will impact anyone except someone running {revdepcheck}.

wcjochem commented 2 years ago

Hi @jonkeane I've made the changes following how you do it in {arrow} in the PR #12 . Example: https://github.com/wcjochem/sfarrow/blob/dev/R/st_arrow.R#L265-L267

If this will fix the errors you're seeing in your checks, I will go ahead and merge it and re-submit to CRAN with the minor update.

jonkeane commented 2 years ago

That PR looks great and should avoid this issue totally. I didn't think this was going to be run into during CRAN checks (since it looked like the examples needed to be run twice in the same session to trigger it), but it turns out that it was. We are preparing a small change to arrow that should also avoid this, but it would still be good for this PR to be incorporated.

wcjochem commented 2 years ago

The update is on its way to CRAN now. And please let me know if you find anything else. Thanks for your help!