ipeaGIT / gtfs2gps

Convert GTFS data into a data.table with GPS-like records in R
https://ipeagit.github.io/gtfs2gps/
Other
69 stars 10 forks source link

filter by route ids and by route type #121

Closed stmarcin closed 4 years ago

stmarcin commented 4 years ago

Hello and many thanks for the package!

I suggest (really) small extension – two additional functions to filter gtfs feeds: by route ids and by transport mode, both in routes.txt (route_id and route_type, respectively). filter_by_route_id() permits to extract data for a single line (or set of lines), while filter_by_route_type() works in the case of gtfs feed that contains data for multiple transport modes (e.g. of gtfs of Warsaw, which contains buses, trams, suburban rails and metro). I believe it can be useful if someone would like to visualize only a selected transport mode(s) instead.

Both functions are attached below and they are based on a similar one filter_by_shape_id() - I’ve used your code as much as possible. If you think it might be useful, I have also functions’ documentation and tests (again – adapted copy-paste from filter_by_shape_id().

filter_by_route_type <- function(gtfs_data, route_types) {
  gtfs_data$routes <- subset(gtfs_data$routes, route_type %in% route_types)
  route_ids <- unique(gtfs_data$routes$route_id)
  gtfs_data$trips <- subset(gtfs_data$trips, route_id %in% route_ids) 

  trip_ids <- unique(gtfs_data$trips$trip_id)
  gtfs_data$stop_times <- subset(gtfs_data$stop_times, trip_id %in% trip_ids)

  shape_ids <- unique(gtfs_data$trips$shape_id)
  gtfs_data$shapes <- subset(gtfs_data$shapes, shape_id %in% shape_ids)

  if(!is.null(gtfs_data$frequencies))
    gtfs_data$frequencies <- subset(gtfs_data$frequencies, trip_id %in% trip_ids)

  stop_ids <- unique(gtfs_data$stop_times$stop_id)
  gtfs_data$stops <- subset(gtfs_data$stops, stop_id %in% stop_ids)

  return(gtfs_data)
}

filter_by_route_id <- function(gtfs_data, route_ids) {
  gtfs_data$routes <- subset(gtfs_data$routes, route_id %in% route_ids)
  gtfs_data$trips <- subset(gtfs_data$trips, route_id %in% route_ids) 

  trip_ids <- unique(gtfs_data$trips$trip_id)
  gtfs_data$stop_times <- subset(gtfs_data$stop_times, trip_id %in% trip_ids)

  shape_ids <- unique(gtfs_data$trips$shape_id)
  gtfs_data$shapes <- subset(gtfs_data$shapes, shape_id %in% shape_ids)

  if(!is.null(gtfs_data$frequencies))
    gtfs_data$frequencies <- subset(gtfs_data$frequencies, trip_id %in% trip_ids)

  stop_ids <- unique(gtfs_data$stop_times$stop_id)
  gtfs_data$stops <- subset(gtfs_data$stops, stop_id %in% stop_ids)

  return(gtfs_data)
}

I hope you find useful.

pedro-andrade-inpe commented 4 years ago

@stmarcin, thanks for the feedback and the code. These functions will be available in the forthcoming 1.2 version of the package.

stmarcin commented 4 years ago

Great! Below: the description of both functions, and tests I prepared for the functions - feel free to use them (or modify) as you want. I hope it makes your life easier. And thanks again for the package! Description for filter_by_route_type():

#' @title Filter GTFS data by transport mode  (route type)
#' 
#' @description Filter a GTFS data by transport mode (coded in the column route_type 
#' in routes.txt). It also removes the  unnecessary trips, stop_times, shapes, 
#' frequencies (if exist in a feed) and stops accordingly.
#' @param gtfs_data A list of data.tables read using gtfs2gps::reag_gtfs().
#' @param route_types A vector of route types belonging to the routes of the
#' gtfs_data data. Note that route_type might be loaded by gtfs2gps::read_gtfs()
#' as a string or a number, depending on the available values.
#' @param remove_invalid Remove all the invalid objects after subsetting the data?
#' The default value is TRUE.
#' @return A filtered GTFS data. 
#' @export
#' @examples
#' poa <- read_gtfs(system.file("extdata/poa.zip", package = "gtfs2gps"))
#' 
#' subset <- filter_by_route_type(poa, "3")

Description for filter_by_route_id():

#' @title Filter GTFS data by route ids
#' 
#' @description Filter a GTFS data by its route ids. It also removes the
#' unnecessary trips, stop_times, shapes, frequencies (if exist in a feed) 
#' and stops accordingly.
#' @param gtfs_data A list of data.tables read using gtfs2gps::reag_gtfs().
#' @param route_ids A vector of route ids belonging to the routes of the
#' gtfs_data data. Note that route_id might be loaded by gtfs2gps::read_gtfs()
#' as a string or a number, depending on the available values.
#' @param remove_invalid Remove all the invalid objects after subsetting the data?
#' The default value is TRUE.
#' @return A filtered GTFS data. 
#' @export
#' @examples
#' poa <- read_gtfs(system.file("extdata/poa.zip", package = "gtfs2gps"))
#' 
#' subset <- filter_by_route_id(poa, "T2")

Tests: probably it would be better to modify an example dataset (for tests) or to use another one, as poa.gtfs contains only one type of route type. If you want, I can prepare one for you.

test_that("filter_by_route_id", {
  poa <- read_gtfs(system.file("extdata/poa.zip", package="gtfs2gps"))

  result <- filter_by_route_id(poa, "T2")
  expect_equal(dim(result$trips)[1], 196)
  expect_equal(dim(result$shapes)[1], 239)
})

test_that("filter_by_route_type", {
  poa <- read_gtfs(system.file("extdata/poa.zip", package="gtfs2gps"))

  result <- filter_by_route_type(poa, "3")
  expect_equal(dim(result$trips)[1], 387)
  expect_equal(dim(result$shapes)[1], 1265)
})

Best MS

pedro-andrade-inpe commented 4 years ago

Thanks. It would be nice if you could prepare a data with more than one route_type. I checked the datasets we have in the package and all them have only one route_type. It would be nice if the dataset has less than 1MB (around 500kB would be great!).

stmarcin commented 4 years ago

Here it comes: (very) minimal example gtfs. It's based on Warsaw's GTFS, but I extracted only 3 routes (bus, tram and suburban rail). It covers only one day and additionally I filtered only trips that start between 8 and 10am so zip has ~50kB :-)

example_gtfs.zip

pedro-andrade-inpe commented 4 years ago

Great, thanks for the contribution. We will add you as ctb of gtfs2gps, ok?

rafapereirabr commented 4 years ago

Hi @stmarcin , thanks for the contribution!

pedro-andrade-inpe commented 3 years ago

Hellow @stmarcin, I would like to add a documentation about where the Warsaw's data set was downloaded from. Do you have a link to a webpage where this data is available?