okfde / froide

Freedom Of Information Portal
MIT License
356 stars 86 forks source link

Bypassing of censorship? #665

Closed pernonides closed 1 year ago

pernonides commented 1 year ago

Hi, when browsing FragDenStaat I noticed an address popping up on my results. However, the address is correctly censored on the page of the FOI request itself.

I've done some testing and my test request https://fragdenstaat.de/a/269260 works as well. The address was censored using the tool "Schwärzen" and is not visible on the page itself.

But: By searching for a part of the sentence containing the address, two versions of the same request are presented in the summary of the result. One version with the address and one without. It seems like the raw text of my request that should only be accessible for me is also accessible for the search engine while I'm not logged in. image

To try it yourself, here is my search request: https://fragdenstaat.de/anfragen/?q=%22und+sollte+automatisch+zensiert+werden%22&status=&jurisdiction=&campaign=&category=&publicbody=&tag=&user=&first_after=&first_before=&sort=

stefanw commented 1 year ago

This is a misunderstanding: if you write your address into the request text itself, you currently have no expectation of redaction (we are not using the word censorship in this context). When making a request we tell people not to include Personally Identifiable Information (PII) in the request description – that's what the separate name and address fields are for.

We run redaction on all messages and store a redacted version because we cannot control their content. Because the request description is part of the original message it gets redacted in these rendered messages. However, the top of the request page contains just the request description (ie. no salutation, greeting, footer etc.) and there we apply only 'live' redaction as a temporary convenience. The search index only ever indexes redacted content: the contents of the permanently redacted messages and the 'live' redaction of the request description.

'Live' redaction means we redact things given the current data of the user. So if you change your address, the old one will no longer be detected and therefore not be redacted anymore. This currently gives an impression of redaction when there shouldn't be an expectation of redaction in the first place. I guess you changed your address to something else and the next indexing of the request made this 'live' redaction not work anymore.

Why shouldn't the request description be automatically redacted? Paradoxically, auto-redaction of the request description can accidentally reveal information about users by hiding it in the context of the request, that's why we are planning on removing auto-redaction for request descriptions completely and will allow users to apply redactions to their request descriptions manually if needed.

The main point stands: you should not include PII in your request description in the first place.


I manually redacted your request's description and it gets indexed like that in the search index and that displayed in result snippets.