Open eamcvey opened 8 years ago
The ability to compare two versions of a gtfs feed from an agency and be shown the differences could be useful -- i.e. to see what changes a transit agency made.
I've been working at the intersection of R and GTFS for bit. I'm glad I came across this! I would be happy to try and push to any part of this project or lead it myself if no one has time.
In Anchorage, AK we use a script that we run continuously during the working hours of our bus line People Mover to generate the protocol buffer that is needed for the GTFS-FS.
https://github.com/codeforanchorage/api-realtime-bus
We run build_protobuf.R in the R directory continuously. I'm sure there is many things inelegant on how I wrote it but I wanted to include it as an example on how we generate the GTFS-RT feed using Dirk E's RProtoBuf package. We are using a feed from the existing vender service that calculates the delays by stop for us which takes out some of the brainy part of our project.
Thanks for exposing transitfeeds.com to me. Before I was looking at the GTFS Exchange which is another resource.
I've make a little script to process stops.txt and shapes.txt into sp objects before pushing them to PostGIS, which I think is the best platform to imo. Open to anything though. https://gist.github.com/hansthompson/a3d2c710ac8e3584d58. The bits inside this gist that convert them to shapefile using WriteOGR could be useful though.
If the PostGIS seems like a good way forward, I would need some help addressing the three concerns I see for this kind of conversion with GTFS that I list at the top of the Gist.
On the not on checking errors within the GTFS feed, the google dash is pretty excellent if you want to throw a gtfs feed against it in testing mode. It would be nice to try it outside the google platform though. I would like a mapping function using leaflet for the testing that could show the routes and the expected positions of buses at a specified time during the day.
I'm also interested in getting the network analysis involved to show the network (maybe in a given time window?) And also showing the network analysis parameters spatially once its done. Here's a pretty rough idea. http://akdata.org/misc/gtfs_network.html. I'm taking a course on network analysis currently and would love to make this a end of semester project that could be generalized to any GTFS.
Finally, perhaps outside the borders of this project is creating a delay analysis package that could take the GTFS and the real time gps data in some standard format to build a protobuf server to scale real-time updates for google for anywhere there is A. GTFS and B. gps on board.
@rustyb has a package called GTFSr that might be a good resource to build off of as well.
@hansthompson Thanks, I'm checking out GTFSr! And thanks for all the information, I am digesting it. It would be great to be able to build on existing stuff.
@hansthompson The list to the gist you provide appears to be broken (or I don't have access?)
Sorry. I'm new to Gists. Try this one.
I can't get the GTFSr vignette to compile. If you get it working would you mind sharing a copy?
Howdy Folks - Thanks for the interest in GTFSr and my apologies for not getting back to you sooner. GTFSr was a wee project for an R course in college.
I've a funny feeling I might not have the actually working version on github. Will dig it out on my machine and get it working again tomorrow.
Just wanted to make a plug for a package I started for network analysis of GTFS this weekend.
https://github.com/hansthompson/gtfsnetwork
It will convert the GTFS files into an edge list and do some filtering by time and service id.
I'm not sure how to write packages for S4 objects though so I just read in the files as seperate data.frames. What are your thoughts of this @eamcvey and @rustyb ?
@hansthompson Things like this network analysis are exactly what I hope would be built into/on top of the package I was envisioning. At minimum, the package should make it easy to get GTFS feeds, assess the quality of the data, save it in useful gtfs object, and make it convenient to do the types of joins that would be most common. I have a start on some of these features that I'll put into a public repo by the end of the week. Then ideally getting the data to the starting point for network analysis is very easy, and you can focus on the network part.
Cool. I'll look forward to it! What are your thoughts on an rmarkdown like output of the feed validation with charts that show when service ids run and maps of the stops, etc?
@eamcvey & @hansthompson Great discussion thus far. I would like to jump in too. I was wondering if the public repo that @eamcvey planned to create was ready. You could outline some specific tasks that we can start working on.
Better late than never - the code I've started on is finally in a public repo here: https://github.com/ropenscilabs/gtfsr I've got the basic functionality to pull feeds from the transitfeeds.com API, putting all the feed data into a list of dataframes (not yet a class, because I'm not sure what level of validation there should be), and creating a validation dataframe as part of that list to start characterizing the data quality of the feed. There is more to be done on data validation (checking that the ids in different data frames match up where they should, for example), thinking to do about what the gtfs object should look like (maybe adapting existing code referenced in this discussion), and lots that could be built on top of this. I have a driver file in the repo that I used to test things out, and there are some functions in there I wrote on the fly that should get formalized.
4 Main Purposes of the Package
convenience functions for common tasks may include;
I started to put together my own package to handle the GTFS-realtime feeds - https://github.com/SymbolixAU/gtfsway
It uses the RProtoBuf
package to load the .proto
file in .onLoad()
. Then the gtfs_realtime()
function reads the binary result of a gtfs real-time response (although at the time of writing this it doesn't do anything with the data, I'm still working on it). For example, the realtime-feed for South East Queensland can be downloaded by
## south east Queensland
url <- "https://gtfsrt.api.translink.com.au/Feed/SEQ"
response <- httr::GET(url)
If you want I can make this into a 'formal' function and issue a PR to incorporate it into gtfsr
?
I'm really glad to read this thread and see more people are interested in using R to do network analysis of GTFS datasets. I hope to contribute more with the project in the future. For now, I share a similar initiative using Java, which can bring some useful insights. It was created by Tyler Green .
http://www.tyleragreen.com/blog/2017/03/graphing-transit-systems-part-ii-centrality/
@eamcvey Nice work! Do you know if there are plans to bring that package into CRAN?
GTFS is a standard format for transit data (routes, stops, schedules, etc.). [There is also a real-time version of GTFS - I'm considering it out of scope for now] https://developers.google.com/transit/gtfs/ If it's easy to work with GTFS data in R, it will facilitate the creation of more sophisticated analysis tools for transit systems built on top of this package.
Get feed data into R:
Validate feed and assess data quality: (which is often poor)
Provide convenience functions for common tasks:
Facilitate creation of a GTFS feed from within R: