mrgeorge / BusTripper

Identify bus trips from GPS locations
3 stars 1 forks source link

generate training data from GTFS schedule #11

Open mrgeorge opened 10 years ago

mrgeorge commented 10 years ago

Interpolate the sequence of stops in each trip at 5 second intervals along the trip shape to produce fake raw locations. This could be a useful training set with the advantage that it's high-fidelity and small (to speedup testing). Another advantage is that it will work for test queries when there exists no real training data (e.g. after a schedule update).

The basic training set can be augmented by performing a few transformations. Shift the start time a bit to model delays, and stretch to model different speeds. Add some noise.

The generated set can't replace the real data because the GTFS schedule doesn't include events like when a bus drives around the depot all day to park. But some combination of schedule-generated data and real data is probably best.