The Traffic Engine (TE) translates vehicle location to OSM-linked speed estimates. By design the TE can be run inside a fleet operator allowing internal conversion from GPS location data to to traffic statistics. This ensures that the only data to leave the data provider’s network are fully anonymized traffic statistics.
That is to say, the series of GPS positions identified with individual vehicles are fed into the traffic engine, but the only information that is pushed out of it into the shared database is identified with map features. It is of course possible to reconstruct paths in places where the number of observations is very low (one or two taxis moving through a sparse residential area) but the traffic engine has a threshold number of observations below which it will not report any data. We may even assume that a place with so few observations has negligible congestion or is rarely traveled through.
This reporting threshold is configurable by the organization running a particular instance. So in sum, each contributor runs their own traffic engine, that traffic engine never shares vehicle identifiers with the outside world, and it only exports speed/congestion data under conditions set freely by the contributor.
This basic architecture should go a long way toward eliminating the risk of tracking any one probe vehicle, but some questions still remain: how high should the observation threshold be set, and might there be other subtle details that would allow a sophisticated consumer of this data to reconstruct trajectories?
The French Open Taxi Data project, which is interested in contributing to our congestion/speed database, is undergoing review by the CNIL and has statisticians available for counsel on user privacy issues. Like all of us they are interested in anonymization, but they have some specific strict guidelines to adhere to. I would welcome any comments here from @l-vincent-l @odtvince or their anonymization advisors.
As stated in the architecture README:
The Traffic Engine (TE) translates vehicle location to OSM-linked speed estimates. By design the TE can be run inside a fleet operator allowing internal conversion from GPS location data to to traffic statistics. This ensures that the only data to leave the data provider’s network are fully anonymized traffic statistics.
That is to say, the series of GPS positions identified with individual vehicles are fed into the traffic engine, but the only information that is pushed out of it into the shared database is identified with map features. It is of course possible to reconstruct paths in places where the number of observations is very low (one or two taxis moving through a sparse residential area) but the traffic engine has a threshold number of observations below which it will not report any data. We may even assume that a place with so few observations has negligible congestion or is rarely traveled through.
This reporting threshold is configurable by the organization running a particular instance. So in sum, each contributor runs their own traffic engine, that traffic engine never shares vehicle identifiers with the outside world, and it only exports speed/congestion data under conditions set freely by the contributor.
This basic architecture should go a long way toward eliminating the risk of tracking any one probe vehicle, but some questions still remain: how high should the observation threshold be set, and might there be other subtle details that would allow a sophisticated consumer of this data to reconstruct trajectories?
The French Open Taxi Data project, which is interested in contributing to our congestion/speed database, is undergoing review by the CNIL and has statisticians available for counsel on user privacy issues. Like all of us they are interested in anonymization, but they have some specific strict guidelines to adhere to. I would welcome any comments here from @l-vincent-l @odtvince or their anonymization advisors.