r-transit / tidytransit

R package for working with GTFS data
https://r-transit.github.io/tidytransit/
150 stars 22 forks source link

Reduce package file size #218

Closed polettif closed 1 month ago

polettif commented 1 month ago

This PR reduces the package size below the 5 MB CRAN policy threshold. To achieve this, I removed some routes from the NYC sample feed and simplified the shapes. The routes are not explicitly used in the vignettes or examples. Given the fact that the feed is already six years old at this point I don't think there is an issue with transforming it to a pure "example feed".

Notable changes:

The code to create the reduced feed is pasted below, for posterity.

```r library(dplyr) gtfs_nyc = read_gtfs("google_transit_nyc_subway.zip") gtfs = gtfs_nyc routes_to_remove = c("A", "SI", "H", "FS", "J", "Z", "F") # Remove constant columns #### gtfs[!names(gtfs) %in% c(".", "agency", "transfers", "routes")] <- gtfs[!names(gtfs) %in% c(".", "agency", "transfers", "routes")] |> lapply(janitor::remove_constant, quiet = FALSE) # Minimize shapes #### shapes_sf = gtfs$shapes |> shapes_as_sf() |> rmapshaper::ms_simplify(keep = 0.1) gtfs$shapes <- dplyr::as_tibble(sf_lines_to_df(shapes_sf)) # Remove select routes and trips #### gtfs$routes |> select(route_id, route_desc) |> print(n = 30) trips_to_keep = gtfs$trips |> filter(!route_id %in% routes_to_remove) |> distinct(trip_id) # Remove unused service_ids #### gtfs$trips <- gtfs$trips |> semi_join(trips_to_keep, "trip_id") gtfs$stop_times <- gtfs$stop_times |> semi_join(gtfs$trips, "trip_id") gtfs$calendar <- gtfs$calendar |> filter(service_id %in% gtfs$trips$service_id) gtfs$calendar_dates <- gtfs$calendar_dates |> filter(service_id %in% gtfs$trips$service_id) # Remove stops #### stops_to_keep = gtfs$stops |> semi_join(gtfs$stop_times, "stop_id") stops_to_keep <- c(stops_to_keep$stop_id, stops_to_keep$parent_station) stops_to_keep <- c(stops_to_keep, gtfs$transfers |> filter(from_stop_id %in% stops_to_keep | to_stop_id %in% stops_to_keep) |> select(from_stop_id, to_stop_id) |> unlist(use.names = F) |> unique()) gtfs$stops <- gtfs$stops |> filter(stop_id %in% stops_to_keep) gtfs$transfers <- gtfs$transfers |> filter(from_stop_id %in% stops_to_keep | to_stop_id %in% stops_to_keep) gtfs$routes <- gtfs$routes |> semi_join(gtfs$trips, "route_id") write_gtfs(gtfs, "google_transit_nyc_subway.zip") ```