zopefoundation / Zope

Zope is an open-source web application server.
https://zope.readthedocs.io
Other
357 stars 101 forks source link

D.O.S. Protection Limits #1177

Open gogobd opened 1 year ago

gogobd commented 1 year ago

Hello, every 01!

I just came across

FORM_MEMORY_LIMIT = 2 ** 20   # memory limit for forms
FORM_DISK_LIMIT = 2 ** 30     # disk limit for forms
FORM_MEMFILE_LIMIT = 2 ** 12  # limit for `BytesIO` -> temporary file switch

In ZPublisher/HTTPRequest.py and I have a question: how do these values relate, "memory limit" is responsible for limiting the request "stream", but I don't understand the documentation of "disk limit" and "memfile limit".

    <key name="form-memory-limit" datatype="byte-size" default="1MB">
      <description>
       The maximum size for each part in a multipart post request,
       for the complete body in an urlencoded post request
       and for the complete request body when accessed as bytes
       (rather than a file).
      </description>
    </key>

    <key name="form-disk-limit" datatype="byte-size" default="1GB">
      <description>
       The maximum size of a POST request body
      </description>
    </key>

    <key name="form-memfile-limit" datatype="byte-size" default="4KB">
      <description>
       The value of form variables of type file with larger size 
       are stored on disk rather than in memory.
      </description>
    </key>

If "disk limit" is the request body size, why is it called "disk" limit?

Do I understand it right that "memfile limit" is a "switch" so that smaller data are stored in memory whereas data larger than 4k are being written to disk?

Additionally I wanted to report that our employees were unable to upload image files with the very restrictive default values after an update; could we consider to raise the form memory limit to say 10MB or more? Will the "memfile" limit of 4k lead to problems with uploads?

Thank you for clarification!

d-maurer commented 1 year ago

Georg Gogo. BERNHARD wrote at 2023-10-25 03:05 -0700:

... If "disk limit" is the request body size, why is it called "disk" limit?

The names come originally from multipart.

multipart maintains parts in memory when they are sufficiently small (controlled by form_memory_limit) and stores them on disk otherwise. The form-disk-limit is the maximal amount of disk space usable for a form.

Do I understand it right that "memfile limit" is a "switch" so that smaller data are stored in memory whereas data larger than 4k are being written to disk?

Right.

Additionally I wanted to report that our employees were unable to upload image files with the very restrictive default values after an update; could we consider to raise the form memory limit to say 10MB or more? Will the "memfile" limit of 4k lead to problems with uploads?

The form-memory-limit limits access when a request variable of type file or the request body is not accessed via the "file API" but as bytes. E.g. request["BODYFILE"] accesses the request body as a file (form-memory-limit has no effect) while request["BODY"] accesses it as bytes (form-memory-limit applies).

Thus, form-memory-limit is used for 2 purposes:

  1. control when a part is moved to disk
  2. limit the part size when accessed as bytes

If you are (or are ready to become) a Zope contributer, you could make a "pull request" to provide separate configuration options for those purposes. I would be in favor thereof.

gogobd commented 1 year ago

Thank you so much for shedding some light on this!

We ran into this problem when we came across this traceback:

Module ZPublisher.WSGIPublisher, line 181, in transaction_pubevents
Module ZPublisher.WSGIPublisher, line 391, in publish_module
Module ZPublisher.WSGIPublisher, line 269, in publish
Module ZPublisher.BaseRequest, line 619, in traverse
Module Products.PluggableAuthService.PluggableAuthService, line 244, in validate
Module Products.PluggableAuthService.PluggableAuthService, line 560, in _extractUserIds
Module plone.restapi.pas.plugin, line 100, in extractCredentials
Module plone.restapi.deserializer, line 8, in json_body
Module ZPublisher.HTTPRequest, line 1058, in get
Module ZPublisher.HTTPRequest, line 1370, in __get__

I found that in ZPublisher.HTTPRequest the value from the configuration is used https://github.com/zopefoundation/Zope/blob/8fdd567347ab0f3973d3f987e55da7a8644d1a82/src/ZPublisher/HTTPRequest.py#L1427 and then, at https://github.com/zopefoundation/Zope/blob/8fdd567347ab0f3973d3f987e55da7a8644d1a82/src/ZPublisher/HTTPRequest.py#L1370 the setting from the configuration is used and the "get" method from https://github.com/zopefoundation/Zope/blob/8fdd567347ab0f3973d3f987e55da7a8644d1a82/src/ZPublisher/HTTPRequest.py#L1058 fails.

Plone just wanted to read the value like this: data = json.loads(request.get("BODY") or "{}") https://github.com/plone/plone.restapi/blob/5f9214950a50d4728d7a245bec1358a68c8f2316/src/plone/restapi/deserializer/__init__.py#L8, not knowing that there are two ways request data are being handled.

Now I wonder how I could make this more convenient. Changing the default values would of course help, but the actual problem is the request.get method that should return a value no matter how it was being stored.

Any thoughts on this?

d-maurer commented 1 year ago

Georg Gogo. BERNHARD wrote at 2023-10-30 02:46 -0700:

... Plone just wanted to read the value like this: data = json.loads(request.get("BODY") or "{}") https://github.com/plone/plone.restapi/blob/5f9214950a50d4728d7a245bec1358a68c8f2316/src/plone/restapi/deserializer/__init__.py#L8, not knowing that there are two ways request data are being handled.

Now I wonder how I could make this more convenient. Changing the default values would of course help, but the actual problem is the request.get method that should return a value no matter how it was being stored.

Any thoughts on this?

Zope should not use FORM_MEMORY_LIMIT to limit how much data can be returned from request["BODY"]. It should use a separate configuration parameter for this (which could be larger than FORM_MEMORY_LIMIT as it affects only a single value, not potentially a whole set). Would you like to contribute this change?

Of course, plone.restapi could use data = {} if "BODYFILE" not in request else json.load(request["BODYFILE"]) instead of data = json.loads(request.get("BODY") or "{}").

This assumes that BODYFILE is not an empty file, if it exists. A more conservative solutions would be

body = request.get("BODYFILE")
if body is not None:
    body = body.read()
data = json.loads(body or "{}")

I prefer the Zope change.

Should you fell uncomfortable with the change contribution, I will do it.

d-maurer commented 1 year ago

multipart maintains parts in memory when they are sufficiently small (controlled by form_memory_limit) and stores them on disk otherwise.

@gogobd I must revert this statement: having reread the documentation in Zope2.Startup:wsgischema.xml, I recognize that form_memory_limit is not used to switch from an in memory representation to a disk representation. Instead, form-memfile-limit is used for this. form_memory_limit is passed as mem_limit to multipart and limits there the total memory used for all request parameter values not stored on disk (thus, the Zope documentation is not correct in this regard).

I no longer think that Zope needs an additional configuration option. Should the default 1MB for form-memory-limit be too small in exceptional cases, reconfigure. Should plone regularly require a larger limit, we can increase the default.

I will create a PR which improves/corrects the DOS protection related Zope documentation.