schollz / howmanypeoplearearound

Count the number of people around you :family_man_man_boy: by monitoring wifi signals :satellite:
MIT License
6.91k stars 385 forks source link

What is legality of monitoring traffic for mac addresses #4

Closed kootenpv closed 7 years ago

kootenpv commented 7 years ago

Hey, nice job on this :) I just wanted to mention that in case you are not aware, it is against the law to sniff packets. The only exception is to sniff on your own network, and only to protect it.

It is unfortunate, otherwise it would be really nice to come up with ways to use it!

At least you should put a "big fat" warning that the use of sniffing is most likely be illegal.

schollz commented 7 years ago

Hi @kootenpv! Thanks for that. Do you know any legal precedent that this type of sniffing violates Section 18 U.S. Code § 2511?

I don't need legal advice. I'm just curious whether "to intercept...electronic communication" applies to this since it is only looking at mac addresses and signal strengths and not actually investigating any of the packet contents (meta-data vs data?).

I see that Google got in trouble for something like this in 2013 but Google went so far as to collect fragments of data (sometimes email/passwords) which is more the definition of "electronic communication" than mac addresses and signal strengths.

kootenpv commented 7 years ago

Hi @schollz. The fact that you are not concerned does not mean you shouldn't put out a warning for people using your software if there is risk involved :)

Some people might live under other laws and they might be put in "danger".

I understand what you're saying w.r.t. meta-data vs data, and I do believe it is a valid point.

Very interesting to read your sources!

schollz commented 7 years ago

Added: https://github.com/schollz/howmanypeoplearearound/commit/28cac50816d4141511c0fa8a3796c2bd22ead3be

Some people might live under other laws and they might be put in "danger".

Absolutely, I totally agree. I never thought about this but I'm also going to add a notice to https://github.com/schollz/find-lf.

I also put this question to the wireshark community.

diogocp commented 7 years ago

IANAL, but if you walk down the street with your smartphone in your pocket with Wi-Fi turned on, you will be "intercepting" thousands of packets from networks that you have no right to connect to.

ansell commented 7 years ago

@kootenpv just interested to know what the distinction is between MAC addresses and SSID's, such as are required for https://github.com/kootenpv/whereami, where you do not currently warn your users that sniffing this traffic may be illegal?

cornishExile commented 7 years ago

UK readers can find a brilliant legal summary, and advice on making MACs anonymous, from the UK Information Commissioners Office (ICO) https://ico.org.uk/media/for-organisations/documents/1560691/wi-fi-location-analytics-guidance.pdf

The 'howmanypeoplearound' situation is specifically dealt with.

Thanks to https://sites-dacb.vuturevx.com/110/3347/landing-pages/ico-produce-wi-fi-analytics-guidance.asp for way into the subject.

kootenpv commented 7 years ago

Maybe I was too paranoid, though I think it is an interesting discussion.

@ansell I did not think about it yet, although the situation is a bit different. Receiving/counting probes of people (this repo) is different from probing access points. As long as your phone is not functioning as an access point (hotspot/tethering), it will not be involved by whereami/FIND.

@cornishExile That's a fantastic link!

I think that the line is drawn if the WiFi data would be used for singling out individuals, as opposed to be collected as statistics: that would most likely be okay?

Another nice advice in the paper is to convert MAC to something non identifiable.

cornishExile commented 7 years ago

@kootenpv yes, it's interesting, isn't it?

Personal data acquisition is OK so long as it has a purpose or benefit to the individual (or group). But I wouldn't worry too much - given that web page adverts can install trackers on devices and lookup Facebook demographics, the bar seems pretty high to me (subjectively speaking).

Some basic privacy questions should be asked of a specific use: "what is the purpose of the data collection?", "what is the benefit (both for individuals and groups?", Is the data secure (eg 'hashing' protects in case of theft or loss)?, and how is data destroyed after use?

I don't think mac addresses collected by this means has any inherent privacy issues. The famous case in the UK of going too far is here http://www.cbsnews.com/news/uk-bars-trash-cans-from-tracking-people-with-wi-fi/ but even then the project was cancelled because of political and social pressure, not compliance issues per se.

Everyone keep developing please!

kootenpv commented 7 years ago

Sounds like a case closed to me :)

cynddl commented 5 years ago

Personal data acquisition is OK so long as it has a purpose or benefit to the individual (or group). […] I don't think mac addresses collected by this means has any inherent privacy issues.

Actually, collecting MAC addresses in the EU now falls under the GDPR regulation, as personal data. Indeed, a MAC address is directly associated with a device and their owner can easily be identified (uniquely).

Therefore, collecting and storing MAC addresses requires explicit consent from users on the network. Pseudonymizing MAC addresses is also not enough.

john-patterson commented 5 years ago

@cynddl Like most people in this thread, I am not a lawyer. There's a clause for being engaged in personal versus commercial pursuits in GDPR.

According to https://www.itgovernance.eu/blog/en/does-the-gdpr-apply-to-me:

The one caveat to that that the GDPR does not apply to people processing personal data in the course 
of exclusively personal or household activity. This means you wouldn’t be subject to the Regulation if 
you keep personal contacts’ information on your computer or you have CCTV cameras on your house 
to deter intruders.

To fall within the remit of the GDPR, the processing has to be part of an “enterprise”. Article 4(18) of 
the Regulation defines this as any legal entity that’s engaged in economic activity. You must be careful 
not to mistake business conducted from home for household activity.
chrisnicola commented 5 years ago

Pseudonymizing MAC addresses is also not enough

@cynddl would hashing them be sufficient? I'm assuming the problem here is to just count the number of unique ones.

cynddl commented 5 years ago

Hashing is generally not a good solution because:

  1. hashing does pseudonymize the MAC addresses, but does not anonymize them. If an adversary knows the algorithm you use to hash MAC addresses, they can iterate through all possible MAC addresses until they find the one that matches a hash,
  2. even if the hash cannot be reversed, an attacker knowing you were the only person in the office at 7am will then learn what is your hashed MAC address, and be able to track when you go in and out, or move through the office (if multiple endpoints are used). This is typically why pseudonymized location traces are often not anonymous data.
mwargan commented 5 years ago

@cynddl does this mean that we can not use this tool to understand traffic in say, a restaurant (with the restaurant/network owners consent)? Would you think there is any way that this tool could be implemented in such a use case?

It was my understanding that GDPR encourages pseudonymization of data; a mac address is in that regard already pseudoanonymous.

cynddl commented 5 years ago

@mwargan If you use such a tool inside the EU, e.g., to monitor traffic in a restaurant, you must either:

  1. obtain consent, prior to data collection, for collecting MAC addresses
  2. aggregate/anonymize the counts immediately when collecting MAC addresses.

If you want to count the exact number of devices connected to a network, you don't need to store MAC addresses. If you want to count the number of devices over, let's say one hour or one day, the naive solution (not GDPR compliant) would be to store the MAC addresses. Privacy-preserving tools for counting distinct elements do exist. See for instance how Tor Metrics estimate the number of IP addresses connected to Tor relays.

Finally, GDPR does encourage pseudonymization, but also clearly consider pseudonymized data as personal data. It helps reduce the risk of data being stolen or misused, but does not grant GDPR compliance per se.

mwargan commented 5 years ago

If you want to count the exact number of devices connected to a network, you don't need to store MAC addresses. If you want to count the number of devices over, let's say one hour or one day, the naive solution (not GDPR compliant) would be to store the MAC addresses.

@cynddl Thanks for the links! Maybe this is more of an analytics problem, but I don't see the difference between the two. Could you explain what you mean? If I collect the number of devices at a given timestamp, or collect the individual MAC's and then calculate the device count at a given time, I end up with the same result. I don't see how storing a MAC address could be beneficial, as in either case I can get the devices over one hour or one day.

cynddl commented 5 years ago

First case: every second, you collect the list of MAC addresses, then compute the length of that list and store to disk timestamp, # of unique addresses. There's no personal data stored.

Second case: every second, you collect the list of MAC addresses and store it to the disk with a timestamp. Every night, you load all the lists in the past 24h, compute the list of unique addresses and store the length of that list to the disk. You then delete the timestamped lists of MAC address. Still not GDPR compliant because on the disk, you have MAC addresses.

mwargan commented 5 years ago

@cynddl oh sure, I understand that! Just curious as to why it could be beneficial to store the MAC address in the first place - from a marketing/analytical point of view, it provides no added benefit as I see it.

I'm thinking of pushing a pull request with maybe a flag option --gdpr to automatically run the script in a GDPR compliant way. Would you be in on helping out?

Edit: not a flag, but I did create a fork and comment out the MAC for GDPR line 250: https://github.com/mwargan/howmanypeoplearearound/commit/9c716bd2807cf21aa50483063ff17d9a1d01cc64 - would this be compliant?

stryngs commented 4 years ago

So, I saw this thread and I had to run my mouth about it. @kootenpv it is 100% legal in the United States to sniff in Monitor Mode under 18 USC 2511. The caveat is that you may not attempt to decrypt anything encrypted.

The GDPR can blow itself. Worse piece of European legislation ever written and I'm proud to say that I willingly violate the hell out of the GDPR's wifi rules. They hath no power in the USA.

mrquincle commented 3 years ago

Old issue, but perhaps still nice to know.

A municipality (Enschede, Netherlands) got fined 600.000 euro for using MAC addresses to do crowd monitoring. The MAC addresses where not only hashed, but also salted and stored for limited time duration. According to the Dutch Data Protection Authority it doesn't matter because the company might know the hash function and salt and can bruteforce stuff.

I don't know how to it "right". It's not that homomorphic encryption or functional encryption would solve this. I think it must go in the direction of the following:

In general it wouldn't be so weird to think that there will be a transition towards privacy-aware protocols just as we saw with the rise of more security protocols (even if they incur overhead).

mwargan commented 3 years ago

@mrquincle very interesting... This article said " It does let us know that the method has now been adjusted, so that data, among other things, is not stored for longer than 24 hours." - was their data being store for more than 24 hours before?

Is this is also a function of time - i.e. maybe the 60 second scanning time wouldn't be subject to this issue?

mrquincle commented 3 years ago

If you can read Dutch here the Dutch Data Protection Authority describes some of the conditions (such as that a hash and a salt are insufficient).

The company behind this had this statement. Here they stated that they do this since 2017 when there was no regulation for this. Since a year they are not storing for longer than 24 hours.

They state that the fine was based on the behavior before that time. They have a privacy policy online targeted to exactly this application. This mentions:

Hence, I think you're right. If you're scanning for short enough times it seems to be okay.

Opt-out Note that they have a very interesting opt-out function, that requires you to give them your MAC address: https://www.citytraffic.nl/opt-out-register/

mac_wifi=$(apg -a 1 -M nl -m12 -n 1 -E ghijklmnopqrstuvwxyz | sed 's/../&:/g;s/:$//')
echo $mac_wifi
curl -X POST -F "mac-wifi=$mac_wifi" https://www.citytraffic.nl/wp-json/contact-form-7/v1/contact-forms/1030/feedback 

You'll get a response like

{"into":"#","status":"mail_sent","message":"Bedankt voor de afmelding; het is verzonden.","posted_data_hash":"578195650c08e4fc146b6d6c165fa637"}