jamespfennell / transiter

Web service for transit data
https://demo.transiter.dev
MIT License
55 stars 6 forks source link

Commit `72ee363` appears to cause trips to disappear/reappear frequently #106

Closed cedarbaum closed 1 year ago

cedarbaum commented 1 year ago

I recently deployed Transiter with commit 72ee363c1d2de752f004f4d540322612cb2a45ee included and noticed trips are now popping in and out of existence fairly often. For example, the NYC subway W route was frequently oscillating between having 10+ trips to between 0-4 trips in the early afternoon today. I reverted the commit and redeployed and noticed the trip counts became stable again.

I haven't had a chance to look into this much further, but below are results from the us-ny-subway/routes/W/trips endpoint taken at approximately the same time from the 2 versions of Transiter:

Without commit 72ee363c1d2de752f004f4d540322612cb2a45ee:

CleanShot 2023-04-25 at 11 59 16@2x

With commit 72ee363c1d2de752f004f4d540322612cb2a45ee:

CleanShot 2023-04-25 at 11 59 21@2x

If there's any other information I can provide, please let me know.

jamespfennell commented 1 year ago

This is a really bad bug! Thank you for noticing and reporting!

I hit this bug yesterday when working on part 2 of that commit/project, but didn’t realize the bug was already in mainline. The problem is that the update code has logic to detect if the GTFS data for a trip has not changed, and if so updates for that trip are skipped. However the code currently fails to mark such trips as “not stale”, so they get deleted. There is a unit test for this case, but it turns out the unit test also has a bug so it doesn’t actually exercise this code path.

I already know the fix and can submit it tonight.

Sorry you hit this! I’m glad reverting worked. Selfishly, this is a great example of why having users who use the code in mainline is so good because bugs get found….hope this doesn’t make you too nervous to use the latest versions…

On Tue, Apr 25, 2023, at 12:14 PM, Sam Cedarbaum wrote:

I recently deployed Transiter with commit72ee363 included and noticed trips are now popping in and out of existence fairly often. For example, the NYC subway W route was frequently oscillating between having 10+ trips to between 0-4 trips in the early afternoon today. I reverted the commit and redeployed and noticed the trip counts became stable again.

I haven't had a chance to look into this much further, but below are results from the us-ny-subway/routes/W/trips endpoint taken at approximately the same time from the 2 versions of Transiter:

Without commit72ee363:

CleanShot 2023-04-25 at 11 59 @.*** https://user-images.githubusercontent.com/1669622/234337286-0b52a969-71af-48e4-81ff-013ba344973c.png

With commit72ee363:

CleanShot 2023-04-25 at 11 59 @.*** https://user-images.githubusercontent.com/1669622/234337288-ed7ab760-30b6-4cab-b495-c2d0a0a0182f.png

If there's any other information I can provide, please let me know.

— Reply to this email directly, view it on GitHub https://github.com/jamespfennell/transiter/issues/106, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4T7SUPFWZIMCDN6TX2WXDXC72ABANCNFSM6AAAAAAXLHL2PQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

cedarbaum commented 1 year ago

Sounds good, thanks!

Sorry you hit this! I’m glad reverting worked. Selfishly, this is a great example of why having users who use the code in mainline is so good because bugs get found….hope this doesn’t make you too nervous to use the latest versions…

No worries at all! I maintain my own Docker image of Transiter (which I do try to keep in sync with mainline +/- a few commits when I am working on something new) so I am able to quickly experiment/rollback when needed.

cedarbaum commented 1 year ago

One more thought sort of related to this - even before this commit, I did notice that alerts sometimes seem to disappear and reappear between updates. I don't see it happen that often, and my initial thought is that it is probably just buggy GTFS updates.

It is something I've been meaning to look into more and maybe there is some way to "smooth" these anomalies out to avoid alerts popping in/out (e.g., wait for an alert to be removed between n > 1 consecutive updates).

Figured I'd mention it on the off chance you think this could be a bug somewhere in Transiter similar to this one.

jamespfennell commented 1 year ago

Fixed in https://github.com/jamespfennell/transiter/commit/46c0cf993f738fd601115a06ac2eda107c7b7732! I also updated the unit tests and end to end tests to catch this kind of bug, so hopefully we don't see it again.

Regarding alerts: I feel I've seen this before too. I'd like to say it's because of the data feed but given the bug you just reported I probably shouldn't be so confident :) I think a reasonable idea to figure this would be to add some more monitoring to the realtime update code. We could add a metric that for each realtime entity (alert, trip or vehicle) reports both the number of entities in the GTFS realtime data, and the number of entities ultimately saved to the database. Any discrepancy between these would indicate something weird in the updater in which the entities are lost (like this bug, or other bugs in the updater, or just GTFS data the updater hasn't been written to handle yet).

jamespfennell commented 1 year ago

I forgot to respond to your suggestion of not deleting alerts right away, but having a grace period where they stick around. This sounds great to me! The big refactor that I'm about to submit would actually make this easier. We could add an updated_at field to the alerts table, and when deleting stale alerts we can keep alerts that were updated within the last N minutes where N is configurable.

cedarbaum commented 1 year ago

That sounds great, thanks for the info! I'll close this issue since it looks like the trip issue is fixed.