plone / Products.CMFPlone

The core of the Plone content management system
https://plone.org
GNU General Public License v2.0
254 stars 191 forks source link

PLIP: Add Transform to remove inline base64 coded content #1979

Open loechel opened 7 years ago

loechel commented 7 years ago

PLIP: Add Transform to remove inline base64 coded content #1979

Responsible Persons

Proposer: Alexander Loechel

Seconder:

Abstract

Plone RichTextFields could hold a complete html document. The HTML Definition allows to put base64 coded content inline into src elements of html tags like img. This content should be served and presented by the Browser, which means as Image if it is in an img src attribute.

This actually is working in normal cases, but base64 coded images could become very large, in this case Plone seems not to serve the document, but seems to process, and could hang the thread.

There have been an addon: collective.base64imagepatch (https://pypi.python.org/pypi/collective.base64imagepatch / https://github.com/collective/collective.base64imagepatch) which does solve this issue by extracting the Image and saving it as a separate Content Object and linking that.

As this add-on is old, and does a lot of unnecessary magic for Plone 4.x Archetypes and Dexterity, I want to reimplement this as a Transform Chain Element in PortalTransform which just removes the image tag completely.

Motivation

Base64 coded images in a RichTextField could resove in a hanging thread and kill therefor a Plone instance. This is actually a denial of service vector.

Some Browsers like several Versions of Firefox does include an image that is drag'n drop into a RichTextField as inline base64 coded image.

Assumptions

Proposal & Implementation

Implement a Transform Element in Products.PortalTransforms to remove all image Elements that have a src attribute starting with

data:[<mime type>][;charset=<charset>][;base64],<data>

Deliverables

Risks

As PortalTransforms where executed on save events, any modification would only apply for new or edited content that contains inline base64 coded content, this could confuse users. This could be reduced if the transform shows a notification on removing any inline content.

Participants

hvelarde commented 7 years ago

I never seen base64 encoded images included in the body of a content type instance; so, I'm against including this in the core, as I see it more as a very specific use case that must be address by a third party add on.

jensens commented 7 years ago

Well, you can get them in by copy/pasting. I am ok to have this in core if the behavior is configurable (on/off) through some option in the control-panel.

And yes, would consider this a PLIP.

loechel commented 7 years ago

@hvelarde I have seen several large Plone installations having trouble with it. Mostly where a lot of People are allowed to edit content. Especially with the Firefox Browser it is or was possible to drag and drop an Image into a RichTextField and that was added as an img with a data-uri inline.

If it would not be a possible security problem I would say yes a third party add on might be better, I have already done this. but I think it should be better in core.

hvelarde commented 7 years ago

sorry guys, but I'm still unconvinced; yes, you can have an base64 encoded image pasted on a RichText field but, as I mentioned above, this is a very specific use case that only will pollute the core and the associated control panel configlet with options that 95% of users (or more) will never use.

the security problem neither worths the work; we have currently other vectors that are far worst that this in current code that must be addressed before; ping me on IRC to learn more.

loechel commented 7 years ago

any input from @plone/framework-team

gforcada commented 7 years ago

@loechel unfortunately the last @plone/framework-team meeting did not happened because the previous one was just me @esteele and @jensens next one should be by PLOG time (18th) so I hope this one happens :-)

gforcada commented 7 years ago

my personal opinion is that we should have that in, but maybe with an option to disable it. I noticed recently that at work we are filtering it already.

davisagli commented 7 years ago

+0 for the feature.

As for the proposed implementation, you're wrong about PortalTransforms being executed on save -- they are executed on demand when something needs to display an output mimetype that requires the transformation, then cached for an hour. That doesn't seem good for this use case; I think for this we would want to replace the data attributes that are stored a single time when the field is saved.

Btw, I think data attributes are currently filtered out by the safe_html transform, since they could include malicious content that is not actually an image and it was easiest to just disallow them entirely rather than check the mimetype.