Closed cedarbaum closed 1 year ago
This is a really bad bug! Thank you for noticing and reporting!
I hit this bug yesterday when working on part 2 of that commit/project, but didn’t realize the bug was already in mainline. The problem is that the update code has logic to detect if the GTFS data for a trip has not changed, and if so updates for that trip are skipped. However the code currently fails to mark such trips as “not stale”, so they get deleted. There is a unit test for this case, but it turns out the unit test also has a bug so it doesn’t actually exercise this code path.
I already know the fix and can submit it tonight.
Sorry you hit this! I’m glad reverting worked. Selfishly, this is a great example of why having users who use the code in mainline is so good because bugs get found….hope this doesn’t make you too nervous to use the latest versions…
On Tue, Apr 25, 2023, at 12:14 PM, Sam Cedarbaum wrote:
I recently deployed Transiter with commit72ee363 included and noticed trips are now popping in and out of existence fairly often. For example, the NYC subway
W
route was frequently oscillating between having 10+ trips to between 0-4 trips in the early afternoon today. I reverted the commit and redeployed and noticed the trip counts became stable again.I haven't had a chance to look into this much further, but below are results from the
us-ny-subway/routes/W/trips
endpoint taken at approximately the same time from the 2 versions of Transiter:Without commit72ee363:
CleanShot 2023-04-25 at 11 59 @.*** https://user-images.githubusercontent.com/1669622/234337286-0b52a969-71af-48e4-81ff-013ba344973c.png
With commit72ee363:
CleanShot 2023-04-25 at 11 59 @.*** https://user-images.githubusercontent.com/1669622/234337288-ed7ab760-30b6-4cab-b495-c2d0a0a0182f.png
If there's any other information I can provide, please let me know.
— Reply to this email directly, view it on GitHub https://github.com/jamespfennell/transiter/issues/106, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4T7SUPFWZIMCDN6TX2WXDXC72ABANCNFSM6AAAAAAXLHL2PQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Sounds good, thanks!
Sorry you hit this! I’m glad reverting worked. Selfishly, this is a great example of why having users who use the code in mainline is so good because bugs get found….hope this doesn’t make you too nervous to use the latest versions…
No worries at all! I maintain my own Docker image of Transiter (which I do try to keep in sync with mainline +/- a few commits when I am working on something new) so I am able to quickly experiment/rollback when needed.
One more thought sort of related to this - even before this commit, I did notice that alerts sometimes seem to disappear and reappear between updates. I don't see it happen that often, and my initial thought is that it is probably just buggy GTFS updates.
It is something I've been meaning to look into more and maybe there is some way to "smooth" these anomalies out to avoid alerts popping in/out (e.g., wait for an alert to be removed between n > 1
consecutive updates).
Figured I'd mention it on the off chance you think this could be a bug somewhere in Transiter similar to this one.
Fixed in https://github.com/jamespfennell/transiter/commit/46c0cf993f738fd601115a06ac2eda107c7b7732! I also updated the unit tests and end to end tests to catch this kind of bug, so hopefully we don't see it again.
Regarding alerts: I feel I've seen this before too. I'd like to say it's because of the data feed but given the bug you just reported I probably shouldn't be so confident :) I think a reasonable idea to figure this would be to add some more monitoring to the realtime update code. We could add a metric that for each realtime entity (alert, trip or vehicle) reports both the number of entities in the GTFS realtime data, and the number of entities ultimately saved to the database. Any discrepancy between these would indicate something weird in the updater in which the entities are lost (like this bug, or other bugs in the updater, or just GTFS data the updater hasn't been written to handle yet).
I forgot to respond to your suggestion of not deleting alerts right away, but having a grace period where they stick around. This sounds great to me! The big refactor that I'm about to submit would actually make this easier. We could add an updated_at
field to the alerts table, and when deleting stale alerts we can keep alerts that were updated within the last N minutes where N is configurable.
That sounds great, thanks for the info! I'll close this issue since it looks like the trip issue is fixed.
I recently deployed Transiter with commit 72ee363c1d2de752f004f4d540322612cb2a45ee included and noticed trips are now popping in and out of existence fairly often. For example, the NYC subway
W
route was frequently oscillating between having 10+ trips to between 0-4 trips in the early afternoon today. I reverted the commit and redeployed and noticed the trip counts became stable again.I haven't had a chance to look into this much further, but below are results from the
us-ny-subway/routes/W/trips
endpoint taken at approximately the same time from the 2 versions of Transiter:Without commit 72ee363c1d2de752f004f4d540322612cb2a45ee:
With commit 72ee363c1d2de752f004f4d540322612cb2a45ee:
If there's any other information I can provide, please let me know.