Open ioggstream opened 2 years ago
When registering a media type, it is requested to provide (references to) Security considerations and Encoding considerations to guide implementers in securely deploy a given technology.
An example section taken from JSON spec is the following
Generally, there are security issues with scripting languages. JSON
is a subset of JavaScript but excludes assignment and invocation.
Since JSON's syntax is borrowed from JavaScript, it is possible to
use that language's "eval()" function to parse most JSON texts (but
not all; certain characters such as U+2028 LINE SEPARATOR and U+2029
PARAGRAPH SEPARATOR are legal in JSON but not JavaScript). This
generally constitutes an unacceptable security risk, since the text
could contain executable code along with data declarations. The same
consideration applies to the use of eval()-like functions in any
other programming language in which JSON texts conform to that
language's syntax.
Since YAML is a superset of JSON, those considerations apply.
Yaml has some features like explicit typing !!str
and local tags that,
depending on the implementation, might trigger unexpected code execution.
document = "!!python/object/apply:os.system ['echo boom!']"
yaml.unsafe_load(document)
# boom!
Implementations shouldn't deliberately expose arbitrary code execution in constructors, and if they do, they shouldn't enable it by default
Many implementations provide safe deserializers to overcome these issues
(e.g pyyaml yaml.safe_load
, ...)
Useful features as anchors can be used to trigger "bombing" attacks see this research paper on yaml libraries, eg.
x: &a1 ["a", "a"]
x2: &a2 [*a1, *a1]
x3: &a3 [*a2, *a2]
This can be addressed using parsers limiting the anchor recursion depth and validating the input before processing it; even in these cases it is important to carefully test the implementation you are going to use.
... more discussions ...
The eval
thing is definitely specific to JSON. It's only an issue because JSON was created as a subset of JS, and originally eval
was the only built-in way to parse it. I don't think that any similar considerations apply to YAML.
The !!python
stuff is definitely a real thing that exists in practice, but it's a nonstandard feature offered by a particular implementation. It's not in the spec, and tag:yaml.org,2002:python
is not a properly minted tag. It's still worth mentioning in general form — implementations probably shouldn't deliberately expose arbitrary code execution in constructors, and if they do, they shouldn't enable it by default.
For what it's worth, the “billion laughs” attack is just one consequence of an implementation not handling aliases correctly. Aliases are not syntactic sugar. YAML documents may contain reference cycles, so they can't be treated as tree structures. An implementation that attempts to treat a cyclic document as a tree structure may infinite-loop. Even if a document is not cyclic, treating it as a tree may lead to improper behavior (such as the “billion laughs” problem).
Expanding a bit more from an implementation perspective on what @Thom1729 explained (I won't tackle eval because it does not apply at all for YAML).
Both !!python
and denial of service attacks via aliases are an issue for certain implementations that convert a YAML stream into native structures/objects.
The first issue is a convenience that should never be enabled by default; in fact I propose that we state strongly in the spec that code execution is strictly forbidden by default and only be enabled explicitly and even then possibly using an external method that ensures that code execution would result to strictly bounded time/memory limits. For an example on how that would be possible please see https://ebpf.io the kernel's eBPF compiler and runtime.
The second issue stems by the way that 'loaders' or 'unmarshalers' (different terminology used by different languages) convert the YAML stream to native objects. In most implementations the loader cannot handle alias references as pointers, rather they 'resolve' the stream to a tree structure. If this results to cycles, then the operation fails, but even if it doesn't it, attacks like "billion laughs" is again possible. The only solution for these kind of loaders would be limits to the amount of alias resolutions performed, although this is far from ideal.
Ideally a loader that can resolve aliases as actual native type references or pointers is immune to attacks and on top of that it can handle cyclic references without an issue. I would propose that this should be the preferred loader implementation from YAML 1.3 forward.
a loader that can resolve aliases as actual native type references or pointers is immune to attacks
Right. I'd add that - since the parsed object may contain cycles - serializing it using formats not supporting cycles (eg. JSON) might still raise issues. While it's not related directly to YAML it's an important caveat imho.
I expect
There's an IETF stream to register the
application/yaml
andtext/yaml
media types, and this section is very important for others yaml-related media types tooSee https://github.com/ietf-wg-httpapi/mediatypes/issues/10
Feel free to file issues there.
cc: @darrelmiller