gotenberg / gotenberg

A developer-friendly API for converting numerous document formats into PDF files, and more!
https://gotenberg.dev
MIT License
7.22k stars 471 forks source link

SSRF protection / HTML tag filtering on Gotenberg level #905

Open sebastian-schlecht opened 1 week ago

sebastian-schlecht commented 1 week ago

Hey @gulien,

First of all, thanks (again) for building gotenberg, We've been using it since 4 years at our company and it has provided a lot of value since.

Now, one topic that regularly pops up in our pentests is SSRF. In our application, we have some ways where a user can provided content for generated HTML, which is then converted to PDF. Now there is multiple defense mechanisms in place but since we have multiple services by now using gotenberg downstream, I am wondering about a more generic solution than filtering content at each stage.

Is there a way to for example disable iframes altogether? I didn't find any obvious Chromium flags that a small PR could provide. Maybe even some sort of HTML sanitisation makes sense on the Gotenberg level for increased security.

Would love to hear your thoughts on this.

gulien commented 1 week ago

Hello @sebastian-schlecht πŸ‘‹

Actually, the flags --chromium-allow-list and --chromium-deny-list prevent any unwanted URL to be loaded (including ones from iframe). By default, it denies access to all local files, except from /tmp.

Is this the mecanism you’re looking for? 😊

sebastian-schlecht commented 1 week ago

Hey @gulien !

Thanks for the prompt reply!

Yeah I was thinking about this and also saw it in the docs. The thing is that we do render a lot of images that point to our internal infra (their urls). Now the user cannot modify this but we'd need to allow list those internal URL patterns to make it work. We could probably get around this somehow with granular and specific allow lists but it would make the setup very rigid for the benefit of fine granular security.

I was thinking that removing iframes in the first place could be a good trade off between flexibility and added security / SSRF prevention.

gulien commented 1 week ago

Make sense!

I wonder if you could inject some JavaScript on your side that removes specific HTML nodes πŸ€” (+ waitForExpression).

I don't know is there is an easy way to do this with the Chrome DevTools Protocol.

sebastian-schlecht commented 1 week ago

I mean since Gotenberg is receiving the HTML directly, would you be open to an additional, optional feature (I might have time to propose something) that parses and "filters" the provided HTML for increased security? A user could specify either a list of tags that are not allowed or sth similar to tackle such injection attacks.

gulien commented 1 week ago

Problem is that I'd like feature parity across all Chromium endpoints, and the only way to do handle it consistently is through the Chrome DevTools Protocol πŸ€” Also, what if a script injects an iframe? What about obfuscated HTML?

The flags --chromium-allow-list and --chromium-deny-list are actually the only way I found to consistently prevent SSRF, as I filter unwanted URLs directly on the Chromium network.