Closed polettif closed 3 years ago
Hi @polettif, thanks for the PR.
So, playing around with this issue I bit I realized that the current approach always works if path
is not tempdir()
. For example, the bit above works fine:
library(gtfsio)
path1 <- system.file("extdata", "sample-feed-fixed.zip", package = "tidytransit")
path2 <- tempfile()
g1 <- import_gtfs(path1)
export_gtfs(g1, path2, as_dir = TRUE)
list.files(path2)
#> [1] "agency.txt" "calendar_dates.txt" "calendar.txt"
#> [4] "fare_attributes.txt" "fare_rules.txt" "frequencies.txt"
#> [7] "routes.txt" "shapes.txt" "stop_times.txt"
#> [10] "stops.txt" "trips.txt"
That's because tempfile()
creates a file/directory inside R temporary directory, whereas tempdir()
returns the path to this temporary directory (and apparently creates it, if it had been removed for any reason - I'll call this directory Rtmp from now on). So when you set path = tempdir()
, the function tries to move the directory where files were extracted to (which is inside Rtmp) to Rtmp itself. But as overwrite = TRUE
(by default), it removes Rtmp and all files within it before attempting to move the files, and that's why we see this 'No such file or directory'
error.
The approach you're following in this PR fixes this, but I'm not sure if we should even use tempdir()
as a path. That's because all packages may store temporary files inside Rtmp, so if we remove/overwrite Rtmp we may be interfering with other packages' behaviour. For example, {geobr}
, {r5r}
and {aopdata}
all use some temporary files inside Rtmp to cache some operations and run faster when used repetitively inside the same R session. If we use tempdir()
as a path to export_gtfs()
we'll be overwriting Rtmp and any files that other packages may be using.
oi <- geobr::read_country()
#> Loading required namespace: sf
#> Using year 2010
list.files(tempdir())
#> [1] "country_2010_simplified.gpkg" "file3f9d49265d57.so"
#> [3] "gtfsio" "metadata_geobr.csv"
But the current approach also removes the Rtmp if it path = tmpdir()
, and, even less usefully, throws an error instead of creating a gtfs-like directory.
Which leads me to think that perhaps we should check if path
points to tmpdir()
, and raise an error if so. Something along the lines of:
if (path == tempdir())
stop("Please use 'path = tempfile()' instead of tempdir() to designate temporary directories")
What do you think?
Ah, I see. Thanks for looking into it! I wasn't aware of how differently tempdir()
is handled.
Which leads me to think that perhaps we should check if path points to tmpdir(), and raise an error if so. Something along the lines of:
if (path == tempdir()) stop("Please use 'path = tempfile()' instead of tempdir() to designate temporary directories")
Prevent using tempdir
in general sounds good to me.
Just a general design thought: I'd still prefer writing files to a directory (as in this PR) instead of "renaming" it, seems cleaner to me and it fails faster if the output directory cannot be written to. There's probably no difference in practice, so there's no need to change it, just wanted to point it out.
Thank you for flagging the issue!
I agree with you that writing directly to the desired path is better than renaming a previously created temporary directory, and I'll work on it as soon as I have some more time.
Alternatively, if you feel like adapting this PR to accommodate the tempdir()
check and the proposed changes I'll more than gladly accept it :)
Cheers!
Alternatively, if you feel like adapting this PR to accommodate the
tempdir()
check and the proposed changes I'll more than gladly accept it :)
I'll do that 👍 cheers
I'd say this is good to merge if it's ok for you @dhersz
Sorry about it, I've been quite busy lately and forgot to merge it. Thanks @polettif!
Currently
export_gtfs(as_dir = TRUE)
renames the created temporary folder. However, I ran into issues with this approach:With this PR
export_gtfs
now writes all files directly topath
ifas_dir
is set toTRUE
instead of writing it to a temporary directory and then renaming that directory topath
. Is there any drawback to this approach that I'm missing?