metro-ontime / performance_tracker

Automated performance report for the LA Metro rail system.
GNU General Public License v3.0
12 stars 12 forks source link

Support Processing Arbitrary Times #18

Open patchneranartkomol opened 5 years ago

patchneranartkomol commented 5 years ago

Some of the scripts generate data files at the exact time the script runs by specifying:

    now = pendulum.now("UTC")

https://github.com/metro-ontime/performance_tracker/blob/master/performance_tracker/process_vehicles.py

https://github.com/metro-ontime/performance_tracker/blob/master/performance_tracker/query_vehicles.py

As an admin, I would like to process an arbitrary range of times. I should be able to pass a window of time over which generate the data files. When that isn't provided, we can default to UTC now.

ctsexton commented 4 years ago

Partially enabled by a03c87c6266b084bfe5bba7fac6d3c145e82cd9c A bit more work required to get this working reliably!

ctsexton commented 4 years ago

We probably need a smarter method of grabbing schedule data. At the moment we just download the latest schedule, which is updated daily. In order to download past schedules, we need to access previous commits in this repo: https://gitlab.com/LACMTA/gtfs_rail

ctsexton commented 4 years ago

However, the reason we want to process arbitrary dates in the first place kind of assumes we already have the data readily available in the first place. And at this point in time, we do have over 12 months of schedule data already processed and uploaded to S3 as well as (unprocessed) tracked vehicle data.

So, for now we can just focus on processing raw vehicle data for previous dates. This might just require a one-off script to access and process the data we already have.