jamespfennell / transiter

Web service for transit data
https://demo.transiter.dev
MIT License
55 stars 6 forks source link

Optionally include trip previews in `/routes` requests #104

Closed cedarbaum closed 1 year ago

cedarbaum commented 1 year ago

Add a include_trips option to both /routes and /routes/{route_id} requests, which allows a list of limited trip references to be returned as part of the response.

The motivation for this change is to allow clients to quickly determine if route(s) have active trips. Though this is easy for a single route (i.e. by calling /routes/{route_id}/trips), doing so for every route in a system requires |routes| separate calls.

jamespfennell commented 1 year ago

Would it be possible to determine this using the realtime service maps? If the map is empty, there are no trips, otherwise there are trips running. (There is an edge case when the service map fails to build, as we've discussed before, but in the long run it would be nice to fix those cases.)

cedarbaum commented 1 year ago

Great suggestion, I didn't think to use service maps (even after staring at them for so long while working on the bus system haha). I think that will work - my main concern would be the topological sort bug you mentioned, which I do still see fairly often in the logs for the NYC subway.

After briefly looking at the code again, I think what would happen in such cases is Transiter would retain the the old service map until it received a valid topology - is that correct? If so, I think there are 2 interesting scenarios that could happen:

Scenario 1:

  1. Valid topology for route X, persist service map with running trips
  2. Invalid topology for route X, do nothing (service map from 1 returned)
  3. Valid topology again, no running trips, persist empty service map

In this case, we incorrectly say that route X is running between times 2 and 3, but I am not too concerned about this, since false positives are preferable to false negatives (at least for my use case).

Scenario 2:

  1. Valid topology, no running trips, persist empty service map
  2. Invalid topology for route X, do nothing (service map from 1 returned)
  3. Valid topology again with running trips, persist service map

In this case, we may fail to notice a route has running trips between 2 and 3, which could be misleading.

I think it's difficult to say how often this could happen, but based on observations from the NYC subway, valid maps are persisted often enough that I am not too concerned.

jamespfennell commented 1 year ago

Yeah, that's exactly right - if there is an error generating the new service map, the old one is preserved. This does lead to issues when the topological sorting doesn't work, as you describe.

I think scenario 1 is not a problem from the perspective of using the realtime service maps to check for active trips. If the service map can't be built, it necessarily means there are trips running. If there are no trips, we build the empty service map which always succeeds (no cycles in an empty graph!). So in scenario 1 step 2, the service map is potentially stale but there are definitely trips running, because we failed to sort some non-empty graph. Checking for active trips by looking at the map returns the right answer that there are active trips.

Scenario 2 is definitely a problem.

This also seems to happen a lot - I'm looking at the http://demo.transiter.dev logs right now and for the last day at least this error has been occurring every few minutes for the 1234567 NYC subway feed. Is wonder should we try to get the bottom of the problem and see if there is a solution? My sense is that it's buggy data but there could also be a bug in Transiter.


By the way, to return to the actual PR. I am a little nervous about the PR because (a) it makes the /systems/<system_id>/routes potentially return a lot of data, so maybe that endpoint would need to be paginated and (b) it slightly violates separation of concerns in the API in the sense that one endpoint (get route) is returning data that logically belongs in a different endpoint (list trips in route). For this reason I would prefer if the service maps feature solved the problem!

Alternatively, if we really do just want to know if there are active trips we could add a boolean or "trips count" field to the Route message. This would be less heavy and faster to calculate. There is some prior art in the API for this: the System message already contains the number of routes, stops, feed, etc. in the system: https://github.com/jamespfennell/transiter/blob/1c4145407384b611b6724d7f275e25f321001275/api/public.proto#L629-L633 Maybe we could add a ChildResources trips field to the route message?

cedarbaum commented 1 year ago

The above makes sense - I think I'll go with the service maps solution for now and see how that goes. The "trips count" solution is also a good suggestion to consider for the future if needed. Thanks as always for your time and feedback!