Closed tbuckl closed 4 years ago
@polettif i created this issue and tagged all the other open issues with frequencies
in order to clarify the project.
i've said this before, but i think the issue is that GTFS wasn't designed to calculate frequency of service. so when feeds show up for which it doesn't work, i'm not that surprised. its not that they are "non-standard." quite the contrary. the standard is just not built for frequencies. in the package, its calculation is a hack.
Thanks for clearing up the issues.
i've said this before, but i think the issue is that GTFS wasn't designed to calculate frequency of service.
Yeah, I'd say so too. Personally, I haven't come across many feeds using frequencies.txt to begin with, probably because they offer less flexibility. And calculating general frequencies from stop_times is, as you said, pretty difficult...
What's the status on this @tbuckl? Because I started working on generalising sf data and I ran into several problem with it because route frequency, stop frequency and their generating/plotting functions are intertwined quite a bit. This wouldn't be a problem as such but since not all feeds have shapes and frequencies it's hard to make it work for basic feeds.
I'd like to make the following changes which might lead to reduced functionality for some use cases (at least initially):
read_gtfs
since there's, as you said, not a standard way yet to calculated these. It would still be possible via get_route_frequency
stops_sf
and shapes_sf
, as they represent the basic geospatial data in a feed. Currently routes_sf
uses (calculated) frequencies and is created from shapes via trips. That functionality should be moved to a standalone function.@polettif this is a pretty big issue, so i haven't had time to tackle it. i'm glad you've thought a bit about how to break it up below. i can imagine a number of issues coming out of this discussion.
thoughts on your proposed changes:
1) this is a good point. two thoughts. first, i think we'd need to develop a strong reason to deprecate functionality that is optional and currently used. second, it sounds like the reason is: there is no standard. but i think there are plenty of examples of successful standards that aren't correct but that are frequently used (atomic precision in GeoJSON and EPSG 4326 come to mind). most of geojson coordinates in 4326 are overly precise. and so, strictly speaking, not correct. however, that doesn't keep people from using geojson. on the contrary, its a popular standard because it "just works." in my experience, thats also true of the frequency functions here. i think it might make sense to open an issue in which we more clearly document for users that frequencies are approximations (through warnings in the console). this is similar to how st_transform works on spherical coordinates (4326), where the user is alerted that the function is basically "off" by a lot for the data type they are using. if that sounds ok, then i can open an issue for that.
2) makes sense.
3) this makes sense in general too. but again, as with (1), i'm unclear on how it looks for users and what the compelling reason to remove functionality for users is. many users new to GTFS want to know "whats the geometry of X route." i've actually forgotten: does routes_sf necessitate frequencies in the package? my intent was to have routes_sf just be routes and not require frequencies.
anyway, should have some time to provide more thoughtful comments next weekend.
Thanks for your reply. I didn't consider backwards compatibility enough, good point.
second, it sounds like the reason is: there is no standard.
Sorry, I didn't phrase it right, I wasn't referring to the GTFS standard in any way. Currently get_route_frequency and get_stop_frequency use default parameters to calculate frequencies which are not documented enough, IMO. While those defaults might be reasonable, it's not trivial to extract the service period you want to do calculations on (as discussed in #72, #106 and as you pointed out in #119).
I guess my point is: Any analysis should be done with purposely defined parameters after the feed has been read. As it stands now, using frequency=TRUE
is a bit of a black box.
this makes sense in general too. but again, as with (1), i'm unclear on how it looks for users and what the compelling reason to remove functionality for users is. many users new to GTFS want to know "whats the geometry of X route."
The basic geometry elements (besides stops) are shapes which are connected to one or more trips. If we do have a shapes_sf (#115), the shapes for a route can be obtained like this
gtfs_obj$trips %>%
filter(route_id == "route_id" & service_id = "service_id") %>%
right_join(gtsf_obj$.$shapes_sf, by = "shape_id")
which could very well be part of its own function (e.g. get_route_shape). I don't think we should precalculate join-and-filter operations upon reading the feed. This might take a very long time for large feeds and means we calculate the same things twice, which are not all needed in the end.
does routes_sf necessitate frequencies in the package? my intent was to have routes_sf just be routes and not require frequencies.
No I'm sorry, I mixed up functions. This was about plot.gtfs.
@polettif ah, i see. you're right! the process for many (if not all) feeds should be to let the user 1) read the feed 2) determine for themselves the right service id and then 3) calculate frequency or route geometry as they need. i can imagine re-working the introductory and frequencies vignette in such a way. then we can deprecate the frequency=TRUE/FALSE and geometry=TRUE/FALSE flag over time. i'll rework the issues to reflect this plan.
see https://github.com/r-transit/tidytransit/issues/106 for more