Interpolate the sequence of stops in each trip at 5 second intervals along the trip shape to produce fake raw locations. This could be a useful training set with the advantage that it's high-fidelity and small (to speedup testing). Another advantage is that it will work for test queries when there exists no real training data (e.g. after a schedule update).
The basic training set can be augmented by performing a few transformations. Shift the start time a bit to model delays, and stretch to model different speeds. Add some noise.
The generated set can't replace the real data because the GTFS schedule doesn't include events like when a bus drives around the depot all day to park. But some combination of schedule-generated data and real data is probably best.
Interpolate the sequence of stops in each trip at 5 second intervals along the trip shape to produce fake raw locations. This could be a useful training set with the advantage that it's high-fidelity and small (to speedup testing). Another advantage is that it will work for test queries when there exists no real training data (e.g. after a schedule update).
The basic training set can be augmented by performing a few transformations. Shift the start time a bit to model delays, and stretch to model different speeds. Add some noise.
The generated set can't replace the real data because the GTFS schedule doesn't include events like when a bus drives around the depot all day to park. But some combination of schedule-generated data and real data is probably best.