tweaselORG / platform

Server for the tweasel.org platform, allowing users to analyse Android and iOS apps for data protection violations and send complaints about them to the data protection authorities.
MIT License
1 stars 0 forks source link

Security considerations for file hosting #7

Closed baltpeter closed 4 months ago

baltpeter commented 5 months ago

We will ask users to upload/forward their correspondence with the controller and later allow them to download it again as attachments to the complaint.

With that, we'll of course need to ensure that we don't accidentally become an arbitrary file hoster that gets abused for malware/illegal file hosting or god knows what else.

baltpeter commented 5 months ago

My thoughts:

With these mitigations, we can hopefully make ourselves unattractive enough for malicious users.

baltpeter commented 4 months ago
  • We don't provide permanent URLs to download files. Instead, download links contain a short-lived token that only allows a very limited number of downloads (3?).

I ended up limiting the token lifetime (currently set to 6 hours, which seems reasonable to me but we can easily change that if necessary) instead of the number of downloads. That seems like a win-win to me: It should be less useful to an attacker, while at the same time being much less limiting to actual users (who will probably not even notice the limitation at all).

For the actual tokens, we could have stored them in the database, but this seemed like a great use for JWT. This way, we don't need to keep track of the tokens. And since the tokens are only very short-lived anyway, we don't even need a persistent secret. That is instead generated anew on each server start.

baltpeter commented 4 months ago

I will use https://www.npmjs.com/package/mime-detect for the file type detection. I tested it on hundreds of example files (EML, PDF, CSV) plus a couple of NDJSON ones and it achieved a 100% detection rate.

I didn't end up using the more popular https://www.npmjs.com/package/file-type because it only deals with binary formats but we have most text-based formats.

In the future, we might want to consider https://github.com/google/magika (maybe as an additional check?). Currently, that doesn't support NDJSON.