r-transit / gtfsio

Read and Write General Transit Feed Specification (GTFS)
https://r-transit.github.io/gtfsio/
Other
13 stars 3 forks source link

zip::zip_list can return absolute file paths #17

Closed mpadge closed 3 years ago

mpadge commented 3 years ago

The zip::zip_list() does not necessarily return file names without paths - see ?zip::zip for details. The following code demonstrates (in a possibly OS-dependent manner, so don't worry if you can't reproduce):

f <- file.path (tempdir (), "mtcars.csv")
write.table (mtcars, f, sep = ",")
z <- file.path (tempdir (), "my.zip")
utils::zip (z, files = f, flags = "-q")
zip::zip_list (z)
#>                    filename compressed_size uncompressed_size
#> 1 tmp/Rtmp8OkYhX/mtcars.csv             861              1780
#>             timestamp permissions    crc32 offset
#> 1 2021-06-21 08:40:22         666 16e2dfc6      0

Created on 2021-06-21 by the reprex package (v2.0.0.9000)

In such cases, the names of the items read in import_gtfs will be the full paths, and the returned object retains those names. Everything works, but any other packages expecting a gtfs feed to have standard names will then fail. This is what happens:

library (gtfsrouter)
library (gtfsio)
f <- berlin_gtfs_to_zip ()
g <- gtfsio::import_gtfs (f)
names (g)
#> [1] "tmp/Rtmp2EVj5z/calendar"   "tmp/Rtmp2EVj5z/routes"    
#> [3] "tmp/Rtmp2EVj5z/trips"      "tmp/Rtmp2EVj5z/stop_times"
#> [5] "tmp/Rtmp2EVj5z/stops"      "tmp/Rtmp2EVj5z/transfers"

Created on 2021-06-21 by the reprex package (v2.0.0.9000)

I'll submit a PR to fix the names once #16 has been merged. The package will then need a final check_gtfs_format() or gtfsio_is_valid() or similar function that confirms that everything has the expected format - that all required tables are present, and that all required columns of required tables are also present. We can worry about that once #16 and this issue have been addressed. :+1:

dhersz commented 3 years ago

Great catch, as always, @mpadge. I can fully reproduce this behaviour in my computer as well.

Regarding the check_gtfs_format() function you propose, I suggest using assert_gtfs(), which I created as a validator for the gtfs class (the name is not great, I know, but both {gtfstools} and {tidytransit} had their own validate_gtfs() already, so I opted for the current one). \ Currently it checks if all elements inside the GTFS are named and if they inherit from data.frame, but it doesn't check the content of their names. What do you think?

mpadge commented 3 years ago

The names are official, and so checking them should be part of assert_gtfs(). I would suggest that it should assert that all required names exist, and that any other names match one of the optional names and nothing else. I'll leave you to do that, and i'll PR with a fix for this. Thanks!!