Add Security considerations.

ioggstream commented 2 years ago

I expect

this spec to have a "Security considerations" section.

There's an IETF stream to register the application/yaml and text/yaml media types, and this section is very important for others yaml-related media types too

See https://github.com/ietf-wg-httpapi/mediatypes/issues/10

Feel free to file issues there.

cc: @darrelmiller

ioggstream commented 2 years ago

When registering a media type, it is requested to provide (references to) Security considerations and Encoding considerations to guide implementers in securely deploy a given technology.

An example section taken from JSON spec is the following

   Generally, there are security issues with scripting languages.  JSON
   is a subset of JavaScript but excludes assignment and invocation.

   Since JSON's syntax is borrowed from JavaScript, it is possible to
   use that language's "eval()" function to parse most JSON texts (but
   not all; certain characters such as U+2028 LINE SEPARATOR and U+2029
   PARAGRAPH SEPARATOR are legal in JSON but not JavaScript).  This
   generally constitutes an unacceptable security risk, since the text
   could contain executable code along with data declarations.  The same
   consideration applies to the use of eval()-like functions in any
   other programming language in which JSON texts conform to that
   language's syntax.

Since YAML is a superset of JSON, those considerations apply.

Yaml has some features like explicit typing !!str and local tags that, depending on the implementation, might trigger unexpected code execution.

document = "!!python/object/apply:os.system ['echo boom!']"
yaml.unsafe_load(document)
# boom!

Implementations shouldn't deliberately expose arbitrary code execution in constructors, and if they do, they shouldn't enable it by default

Many implementations provide safe deserializers to overcome these issues (e.g pyyaml yaml.safe_load, ...)

Useful features as anchors can be used to trigger "bombing" attacks see this research paper on yaml libraries, eg.

x:   &a1 ["a", "a"] 
x2: &a2 [*a1, *a1]
x3: &a3 [*a2, *a2]

This can be addressed using parsers limiting the anchor recursion depth and validating the input before processing it; even in these cases it is important to carefully test the implementation you are going to use.

... more discussions ...

Thom1729 commented 2 years ago

The eval thing is definitely specific to JSON. It's only an issue because JSON was created as a subset of JS, and originally eval was the only built-in way to parse it. I don't think that any similar considerations apply to YAML.

The !!python stuff is definitely a real thing that exists in practice, but it's a nonstandard feature offered by a particular implementation. It's not in the spec, and tag:yaml.org,2002:python is not a properly minted tag. It's still worth mentioning in general form — implementations probably shouldn't deliberately expose arbitrary code execution in constructors, and if they do, they shouldn't enable it by default.

For what it's worth, the “billion laughs” attack is just one consequence of an implementation not handling aliases correctly. Aliases are not syntactic sugar. YAML documents may contain reference cycles, so they can't be treated as tree structures. An implementation that attempts to treat a cyclic document as a tree structure may infinite-loop. Even if a document is not cyclic, treating it as a tree may lead to improper behavior (such as the “billion laughs” problem).

pantoniou commented 2 years ago

Expanding a bit more from an implementation perspective on what @Thom1729 explained (I won't tackle eval because it does not apply at all for YAML).

Both !!python and denial of service attacks via aliases are an issue for certain implementations that convert a YAML stream into native structures/objects.

The first issue is a convenience that should never be enabled by default; in fact I propose that we state strongly in the spec that code execution is strictly forbidden by default and only be enabled explicitly and even then possibly using an external method that ensures that code execution would result to strictly bounded time/memory limits. For an example on how that would be possible please see https://ebpf.io the kernel's eBPF compiler and runtime.

The second issue stems by the way that 'loaders' or 'unmarshalers' (different terminology used by different languages) convert the YAML stream to native objects. In most implementations the loader cannot handle alias references as pointers, rather they 'resolve' the stream to a tree structure. If this results to cycles, then the operation fails, but even if it doesn't it, attacks like "billion laughs" is again possible. The only solution for these kind of loaders would be limits to the amount of alias resolutions performed, although this is far from ideal.

Ideally a loader that can resolve aliases as actual native type references or pointers is immune to attacks and on top of that it can handle cyclic references without an issue. I would propose that this should be the preferred loader implementation from YAML 1.3 forward.

ioggstream commented 2 years ago

a loader that can resolve aliases as actual native type references or pointers is immune to attacks

Right. I'd add that - since the parsed object may contain cycles - serializing it using formats not supporting cycles (eg. JSON) might still raise issues. While it's not related directly to YAML it's an important caveat imho.

yaml / yaml-spec

Add Security considerations. #272

I expect