noi-techpark / bdp-commons

GNU Affero General Public License v3.0
2 stars 12 forks source link

As a traffic analyst I would like to align the anonymization mechanisms of different Bluetooth sensors data providers #561

Open rcavaliere opened 1 year ago

rcavaliere commented 1 year ago

We have nearly completed the integration of real-time traffic + Bluetooth detections of the traffic monitoring stations of the Province of Bolzano, supplied by the local company Famas System (see #554). In order to jointly use the data collected by these Bluetooth sensors with the one provided by the sensors developed by CISMA, we need to align the anonymization mechanism.

Since Famas uses a MD5 hash, we should check and eventually change the way the CISMA Bluetooth sensor data are anonymized.

dulvui commented 1 year ago

@rcavaliere I checked now again and using MD5 is actually not a problem, because it is impossible to revert to the original value. The online MD5 converter we found online works with big databases where the input with the corresponding hash is saved, but the database has not all possible values and so Bluetooth Mac addresses are safe. CISMA is using SHA256 with an additional encryption key.

Current setup: CISMA SHA256 || FAMAS MD5

There are this options to align the data: 1) Change CISMA to MD5 in the data collector (easy to implement) CISMA MD5 || FAMAS MD5 2) Add SHA256 on top of the MD5 hash of FAMAS and do an MD5 hash before the SHA256 on CISMA (more secure) CISMA MD5 -> SHA256 || FAMAS MD5 -> SHA256

Both options make historical data incompatible with new data.

Do you think one of this options could work?

rcavaliere commented 1 year ago

@dulvui I would suggest to go for 1. since you say that you can not obtain the MAC addresses fro the MDS hashed data. Of course there is nothing to do with old data, this is something that we can activate from a certain point with new data coming on. @ohnewein are you fine with this proposal? FYI: Famas can not change nothing since the hashing is done at a sensor level and for them a change of the software inside each sensor is unpracticable

ohnewein commented 1 year ago

Seams to be a reasonable proposal.

dulvui commented 1 year ago

@rcavaliere Okay then I'll change CISMA to MD5 on testing today Then next week lets see if there are values that are the same on CISMA and FAMAS side. I'll prepare a query for that

rcavaliere commented 1 year ago

@dulvui for the testing, try to find some detection in the stations "P_Campiglio" (CISMA Data Provider) and "4" (FAMAS Data Provider)

rcavaliere commented 1 year ago

Let's wait for #554 to be released on production

rcavaliere commented 1 year ago

Issue #554 is unblocked now. Once completed, you can put this in production as well, @dulvui