Security considerations for file hosting

tweaselORG / platform

Server for the tweasel.org platform, allowing users to analyse Android and iOS apps for data protection violations and send complaints about them to the data protection authorities.

MIT License

1 stars 0 forks source link

Security considerations for file hosting #7

Closed baltpeter closed 4 months ago

baltpeter commented 5 months ago

We will ask users to upload/forward their correspondence with the controller and later allow them to download it again as attachments to the complaint.

With that, we'll of course need to ensure that we don't accidentally become an arbitrary file hoster that gets abused for malware/illegal file hosting or god knows what else.

baltpeter commented 5 months ago

My thoughts:

[x] Heavy restrictions on the files: We only allow uploads and downloads of PDFs and EMLs.
[x] We don't allow downloading the individual files but instead wrap all attachments in a ZIP.
[x] We limit the timeframe during which attachments can be downloaded: Downloading is only possible once we are actually at the point when a complaint can be made. Files expire at some point.
[x] We don't provide permanent URLs to download files. Instead, download links contain a short-lived token that only allows a very limited number of downloads (3?).

With these mitigations, we can hopefully make ourselves unattractive enough for malicious users.

baltpeter commented 4 months ago

We don't provide permanent URLs to download files. Instead, download links contain a short-lived token that only allows a very limited number of downloads (3?).

I ended up limiting the token lifetime (currently set to 6 hours, which seems reasonable to me but we can easily change that if necessary) instead of the number of downloads. That seems like a win-win to me: It should be less useful to an attacker, while at the same time being much less limiting to actual users (who will probably not even notice the limitation at all).

For the actual tokens, we could have stored them in the database, but this seemed like a great use for JWT. This way, we don't need to keep track of the tokens. And since the tokens are only very short-lived anyway, we don't even need a persistent secret. That is instead generated anew on each server start.

baltpeter commented 4 months ago

I will use https://www.npmjs.com/package/mime-detect for the file type detection. I tested it on hundreds of example files (EML, PDF, CSV) plus a couple of NDJSON ones and it achieved a 100% detection rate.

I didn't end up using the more popular https://www.npmjs.com/package/file-type because it only deals with binary formats but we have most text-based formats.

In the future, we might want to consider https://github.com/google/magika (maybe as an additional check?). Currently, that doesn't support NDJSON.