plone / plone.restapi

RESTful API for Plone.
http://plonerestapi.readthedocs.org/
84 stars 73 forks source link

json_body should not read entire request BODY #1730

Open mauritsvanrees opened 8 months ago

mauritsvanrees commented 8 months ago

So this line is bad:

data = json.loads(request.get("BODY") or "{}")

With Zope 5.8.4+ this fails when the request BODY is larger than 1MB. See discussion in https://github.com/plone/Products.CMFPlone/issues/3848 and https://github.com/zopefoundation/Zope/pull/1142. See my summary of the history and the steps taken so far.

Temporarily fixed in https://github.com/plone/plone.restapi/pull/1729 by disabling a Zope form memory limit check.

gogobd commented 8 months ago

I was patching this like this:

…/plone/restapi/deserializer/__init__.py

from plone.restapi.exceptions import DeserializationError

import json

def json_body(request):
    try:
        if hasattr(request, "BODYFILE"):
            data = request.get('BODYFILE').read()
        else:
            data = json.loads(request.get("BODY") or "{}")
    except ValueError:
        raise DeserializationError("No JSON object could be decoded")
    if not isinstance(data, dict):
        raise DeserializationError("Malformed body")
    return data

def boolean_value(value):
    """

    Args:
        value: a value representing a boolean which can be
               a string, a boolean or an integer
                   (usually a string from a GET parameter).

    Returns: a boolean

    """
    return value not in {False, "false", "False", "0", 0}

But as @mauritsvanrees pointed out: "We should not read the complete request BODY in memory." and so that's probably not helping. I wasn't able to come up with a test that would actually push a request through Zope's machinery to address this issue. I also think that a "fix" like the one I show here would not really eliminate the D.O.S. threat that potentially was the reason for introducing the limits in the first place. The only "good" fix would be to have every request body be a stream in all cases, but I wonder if that's feasible. Can we use a data structure that is a stream and behaves like "in memory data"?

gogobd commented 8 months ago

Additionally we "fixed" the problem by upping the limits in zope.conf like this:

<dos_protection>
    form-memory-limit 1GB
    form-disk-limit 1GB
    form-memfile-limit 4KB
</dos_protection>
gogobd commented 8 months ago

... maybe at one point we will have to live with "huge" data in memory.

wesleybl commented 8 months ago

Can this problem cause high memory usage on portals that upload large files?

davisagli commented 8 months ago

json_body's job is to return an entire data structure deserialized from JSON. If we give it a very large JSON document, it can't do much about that.

It might be possible to create a new function which returns a specific key/value from a JSON document without reading the entire structure into memory, using something like https://pypi.org/project/ijson/ -- but you need to look at the specific places where json_body is used to see if that is helpful.

For the use case of file uploads in particular, I think the current approach that volto uses (base64-encoded file data in a JSON body) is not optimal. I think we should explore adding support for a different serialization format in plone.restapi, with a multipart/form-data body that includes both the content as JSON and separate files as attachments (and some convention for referencing the files from within the JSON structure).