transitmatters / gobble

🦃 Process MBTA events into a format that can be consumed by the Data Dashboard
MIT License
2 stars 3 forks source link

Track long outages #61

Open hamima-halim opened 10 months ago

hamima-halim commented 10 months ago

ISSUE

part 1 of a 2 parter to address stop_id/direction outages in our vehicle pings https://github.com/transitmatters/gobble/issues/51

in part 1, build out a cache/data structure that keeps track of degenerate GPS ping sequences that don't have stop_id or directional info. it should probably be purged daily in part 2, we'll single out ping sequences that have been dark for sufficiently long (~1 minute) and start trying to intuit their route progress with a shape interpolation algorithm using shapes.txt.

BACKGROUND

as far as i can tell, there's 3 kinds of outages (c+p from the original issue)

THIS PR

i've set the cutoff for suspected extended outages to be 1 minute, after which we start trying to do shape interpolation things. there's some logic here to try and figure out just whether a bad ping is a continuation of a sequence we're already tracking, or if this is a new instance of an outage.

i really do not like this dict-as-a-cache setup here at the moment. yall got cleaner suggestions?

rudiejd commented 10 months ago

if you're looking for a builtin solution for caching on the file-system to avoid ballooning memory usage, I think shelve could work for your needs here: https://docs.python.org/3/library/shelve.html