in part 1, build out a cache/data structure that keeps track of degenerate GPS ping sequences that don't have stop_id or directional info. it should probably be purged daily
in part 2, we'll single out ping sequences that have been dark for sufficiently long (~1 minute) and start trying to intuit their route progress with a shape interpolation algorithm using shapes.txt.
BACKGROUND
as far as i can tell, there's 3 kinds of outages (c+p from the original issue)
short stretch outages, which occur for less than a minute and tend to happen at the beginning/end of the stop. these are short enough that we could probably ignore them and have reasonable calculations, even if they happen in the middle of a trip.
medium stretch outages, which occur for maybe 2-10 minutes at a time. we see these a lot on the 39 (potentially caused by a glitchy AVL) and they can cause us to lose information for a couple of stops.
long stretch outages, which might be because the AVL for a vehicle wasn't turned on but GPS was still reporting info.
THIS PR
i've set the cutoff for suspected extended outages to be 1 minute, after which we start trying to do shape interpolation things. there's some logic here to try and figure out just whether a bad ping is a continuation of a sequence we're already tracking, or if this is a new instance of an outage.
i really do not like this dict-as-a-cache setup here at the moment. yall got cleaner suggestions?
if you're looking for a builtin solution for caching on the file-system to avoid ballooning memory usage, I think shelve could work for your needs here: https://docs.python.org/3/library/shelve.html
ISSUE
part 1 of a 2 parter to address stop_id/direction outages in our vehicle pings https://github.com/transitmatters/gobble/issues/51
in part 1, build out a cache/data structure that keeps track of degenerate GPS ping sequences that don't have stop_id or directional info. it should probably be purged daily in part 2, we'll single out ping sequences that have been dark for sufficiently long (~1 minute) and start trying to intuit their route progress with a shape interpolation algorithm using
shapes.txt
.BACKGROUND
as far as i can tell, there's 3 kinds of outages (c+p from the original issue)
THIS PR
i've set the cutoff for suspected extended outages to be 1 minute, after which we start trying to do shape interpolation things. there's some logic here to try and figure out just whether a bad ping is a continuation of a sequence we're already tracking, or if this is a new instance of an outage.
i really do not like this dict-as-a-cache setup here at the moment. yall got cleaner suggestions?