r-transit / gtfsio

Read and Write General Transit Feed Specification (GTFS)
https://r-transit.github.io/gtfsio/
Other
13 stars 3 forks source link

write to directory instead of renaming temp dir #15

Closed polettif closed 3 years ago

polettif commented 3 years ago

Currently export_gtfs(as_dir = TRUE) renames the created temporary folder. However, I ran into issues with this approach:

library(gtfsio)
path1 <- system.file("extdata", "sample-feed-fixed.zip", package = "tidytransit")
path2 <- tempdir()
g1 <- import_gtfs(path1)
export_gtfs(g1, path2, as_dir = TRUE)
#> Warning message:
#> In file.rename(tmpd, path) :
#>   cannot rename file '/var/folders/p8/ck58x_354mj8vgdcqyyz_r540000gn/T//RtmpR8xTkj/gtfsio5bc7fc71ddf' 
#>   to '/var/folders/p8/ck58x_354mj8vgdcqyyz_r540000gn/T//RtmpR8xTkj', reason 'No such file or directory'

With this PR export_gtfs now writes all files directly to path if as_dir is set to TRUE instead of writing it to a temporary directory and then renaming that directory to path. Is there any drawback to this approach that I'm missing?

dhersz commented 3 years ago

Hi @polettif, thanks for the PR.

So, playing around with this issue I bit I realized that the current approach always works if path is not tempdir(). For example, the bit above works fine:

library(gtfsio)
path1 <- system.file("extdata", "sample-feed-fixed.zip", package = "tidytransit")
path2 <- tempfile()
g1 <- import_gtfs(path1)
export_gtfs(g1, path2, as_dir = TRUE)
list.files(path2)
#>  [1] "agency.txt"          "calendar_dates.txt"  "calendar.txt"       
#>  [4] "fare_attributes.txt" "fare_rules.txt"      "frequencies.txt"    
#>  [7] "routes.txt"          "shapes.txt"          "stop_times.txt"     
#> [10] "stops.txt"           "trips.txt"

That's because tempfile() creates a file/directory inside R temporary directory, whereas tempdir() returns the path to this temporary directory (and apparently creates it, if it had been removed for any reason - I'll call this directory Rtmp from now on). So when you set path = tempdir(), the function tries to move the directory where files were extracted to (which is inside Rtmp) to Rtmp itself. But as overwrite = TRUE (by default), it removes Rtmp and all files within it before attempting to move the files, and that's why we see this 'No such file or directory' error.

The approach you're following in this PR fixes this, but I'm not sure if we should even use tempdir() as a path. That's because all packages may store temporary files inside Rtmp, so if we remove/overwrite Rtmp we may be interfering with other packages' behaviour. For example, {geobr}, {r5r} and {aopdata} all use some temporary files inside Rtmp to cache some operations and run faster when used repetitively inside the same R session. If we use tempdir() as a path to export_gtfs() we'll be overwriting Rtmp and any files that other packages may be using.

oi <- geobr::read_country()
#> Loading required namespace: sf
#> Using year 2010

list.files(tempdir())
#> [1] "country_2010_simplified.gpkg" "file3f9d49265d57.so"         
#> [3] "gtfsio"                       "metadata_geobr.csv"

But the current approach also removes the Rtmp if it path = tmpdir(), and, even less usefully, throws an error instead of creating a gtfs-like directory.

Which leads me to think that perhaps we should check if path points to tmpdir(), and raise an error if so. Something along the lines of:

if (path == tempdir())
  stop("Please use 'path = tempfile()' instead of tempdir() to designate temporary directories")

What do you think?

polettif commented 3 years ago

Ah, I see. Thanks for looking into it! I wasn't aware of how differently tempdir() is handled.

Which leads me to think that perhaps we should check if path points to tmpdir(), and raise an error if so. Something along the lines of:

if (path == tempdir())
  stop("Please use 'path = tempfile()' instead of tempdir() to designate temporary directories")

Prevent using tempdir in general sounds good to me.

Just a general design thought: I'd still prefer writing files to a directory (as in this PR) instead of "renaming" it, seems cleaner to me and it fails faster if the output directory cannot be written to. There's probably no difference in practice, so there's no need to change it, just wanted to point it out.

dhersz commented 3 years ago

Thank you for flagging the issue!

I agree with you that writing directly to the desired path is better than renaming a previously created temporary directory, and I'll work on it as soon as I have some more time.

Alternatively, if you feel like adapting this PR to accommodate the tempdir() check and the proposed changes I'll more than gladly accept it :)

Cheers!

polettif commented 3 years ago

Alternatively, if you feel like adapting this PR to accommodate the tempdir() check and the proposed changes I'll more than gladly accept it :)

I'll do that 👍 cheers

polettif commented 3 years ago

I'd say this is good to merge if it's ok for you @dhersz

dhersz commented 3 years ago

Sorry about it, I've been quite busy lately and forgot to merge it. Thanks @polettif!