pushshift / api

Pushshift API
1.29k stars 107 forks source link

PushShift & GDPR #70

Open throwaway34241 opened 3 years ago

throwaway34241 commented 3 years ago

Hey! How does PushShift comply to GDPR? If I want data removed from PushShift's database, who should I contact?

bdavisonhelo commented 3 years ago

There is no requirement to comply with GDPR if you don't market to the EU.

throwaway34241 commented 3 years ago

International companies must comply to GDPR as long as they have users in the EU. I can access the PushShift api in EU. ianal though.

ghnp5 commented 3 years ago

Not only that, but the fact that you have data of EU users/individuals. There is a clear breach of GDPR here, if you are not removing the deleted contents from Reddit that belong to EU users.

ghnp5 commented 3 years ago

@Juan-Castelli I believe that you as an EU citizen have the right to complain to GDPR, whether you are currently in the EU or not.

Raising a GDPR complaint is simple -- https://www.gov.uk/data-protection/make-a-complaint (would be nice if everyone could raise one)

Main thing to do before raising a GDPR complaint is to talk to the data processor. Since they are not listening to us nor doing anything about the issue, then the next step is definitely raise a GDPR complaint.

I'll also raise this with Reddit themselves (eurepresentative@reddit.com, legal@reddit.com, redditdatarequests@reddit.com), and see what we can do from here.

MicahRCM commented 3 years ago

@ghnp5 is right. You have the right to lodge a complaint. However, there is very little chance that it will have any impact:

  1. Pushshift is not an international company. Having access to a company's website/API from Europe does not make complying with the GDPR requisite of said company.
  2. It is unclear and unlikely that Pushshift data falls under the Personally Identifiable Information (PII) definition, which is data that can be used to identify specific individuals. Full definition here. Some data may fall under the PII definition should a user mention their name, email, phone number, etc., but only that specific comment/post would fall under the PII definition.
  3. Pushshift user data and Pushshift data are two completely different sets of data. Pushshift user data would be, for example only, if Pushshift had you make an account to generate an API key, and then tracked what you were querying and when. Pushshift data, however, is the data with which we are all familiar, sourced from the API and Pushshift files.
  4. The only way to make a legitimate data deletion request is through Reddit itself. Because emails or other PII are not attached to the Pushshift data, the only way to authenticate the person requesting the data deletion is by submitting a request through Reddit.

Less relevant, but keep this in mind:

  1. Pushshift data files are constantly being seeded. I am seeding some files right now, in fact. Even if user data was deleted from the Pushshift API, it would still be widely available for download.
  2. Pushshift is a small operation and the owner has stated they are working on data deletion requests.

What I would like to learn more about from you (@throwaway34241 and @ghnp5), is what could the EU actually do about this, should they choose to do something? Do they enforce fines for non-EU SMEs (small and medium-sized enterprises)?

Thank you!

ghnp5 commented 3 years ago

he only way to make a legitimate data deletion request is through Reddit itself.

Which we ask, and Reddit deletes, but the data is still in Pushshift.

Pushshift data, however, is the data with which we are all familiar, sourced from the API and Pushshift files.

As soon as the data Pushshift "steals" from Reddit is on Pushshift's servers, then Pushshift is responsible for that data. If the data is removed from Reddit, and Pushshift keeps it, then there's a problem here.

I never consented with Pushshift holding the data I generated in Reddit.

If this was the case, then Google could simply keep all the data in the internet, without going through the headaches of the "right to be forgotten".

throwaway-9992 commented 1 year ago

@ghnp5 Hi, any news on this? Did you receive any answer?

Ruakij commented 1 year ago

@MicahRCM

Pushshift is not an international company. Having access to a company's website/API from Europe does not make complying with the GDPR requisite of said company.

You are right, but GDPR Art.3 specifies where the GDPR applies.
Nr.2 specifies a) either offering goods or services to data subjects in the union or b) monitoring of their behaviour if it takes place in the Union. PushShift will fall unter b).

It is unclear and unlikely that Pushshift data falls under the Personally Identifiable Information (PII) definition

GDPR Art.4 Nr.1 defines 'personal data': "any information relating to an identified or identifiable natural person [..] one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person"

The Reddit username therefore is 'personal data'.

the only way to authenticate the person requesting the data deletion is by submitting a request through Reddit

Thats what we want, which still isnt offered by PushShift.