stack overflows - Githubissues

jgm commented 1 year ago

The recursive algorithm (get_node) in to_ast can produce Lua stack overflows. These happen much faster with lua 5.1 and luajit than with later versions, but you can get failures of the pathological tests with lua 5.4 by cranking up the number of nested block quotes.

Explore rewriting this in a non-recursive style.

matklad commented 1 year ago

Oh, this one is nasty!

The problem is, even if we fix our code to be non-recursive, the user's code that consumes the tree might still stack-overflow. Given the extensible nature of djot, I expect post-processing document tree to be a common thing to do.

The catastrophic scenario I am imagining here is that you develop some software which accepts user-supplied input in djot, and then recursively traverses the djot for whatever reason. In this setup, djot opens a DOS avenue by users sending a bunch of nested symbols.

I can see two ways to protect not only the parser itself, but the user's code on top:

ensure that the interface we expose is non-recursive (so, the JSON AST will be a stream of events much like matches, rather than a real tree).
add an implementation limit for the depth. That is, specify that by default djot docs have max depth of 64, and everything beyond that is just a str.

I quite like the limited depth solution -- it protects the user from a corner-case without them even knowing. It shouldn't be limiting in practice, and that needs only to be a default implementation limit which you could raise if you want to. It should be rougrly agreed upon by all impls though.

jgm commented 1 year ago

I'm already most of the way through rewriting the ast builder to avoid recursion (the code is also going to be much easier to understand -- the old djot/ast.lua was a mess).

Once I've done that, we can investigate the other issues you raise. It's true that traverse from djot.filter is also recursive, so running a filter on a deeply nested document could also produce a Lua stack overflow.

The renderers are also recursive.

In both cases we could rewrite in non-recursive style, but that's a bit unnatural for this code.

Note that we can always put a Lua function under a pcall and trap the stack overflow exception, limiting the risk to the user. Also, I don't think traverse would be a DOS vector, because the stack overflow would kick in before it took appreciable time. As long as it's handled, there shouldn't be a problem.

jgm commented 1 year ago

I'm also planning to revamp the API a bit more, along the following lines.

There will be a tokenizer, which gives you an iterator over match objects. Ideally, this will be streaming: there will be a method to feed in some input, so you can start to receive match objects even before you've finished giving it the full document's input.
The parser will use this tokenizer and construct the AST by iterating.

A user could just use the iterator and build an AST (or directly render a document) themselves...of course, there's a lot of complexity in this, so I think usually they'll want to use our AST producer.

matklad commented 1 year ago

Yeah, I think for this particular implementation this isn't a problem, I worry more about ecosystem-wide effects.

Like, if someone does AST traversal in Rust, a stack overflow there would crash the process.

If we actually require implementations to enforce depth limit by default (irrespective of whether a particular impl can handle arbitrary nesting), than they'll do it, and we can be sure that user's won't see SOs when writing their own traversals. If we left this unspecified, than impls would tend to try to handle arbitrary depth, and that would lead to "pasting arbitrary djot can crash your process" kind of problems.

Basically, I see the limit not as a shortcut for implementations to avoid reifying the stack, but rather as an extra implementation requirement and obligation, which ultimately helps the user.

jgm commented 1 year ago

I would be more comfortable making this a permission rather than an obligation. We did something like that in commonmark, saying that a conforming implementation could impose a nesting depth on parenthesis in URLs, for example.

Conceptually, this feels like an implementation issue rather than something that should be built into the format.

jgm commented 1 year ago

Confirming that with the rendering, a stack overflow is still possible, but you need quite a lot of nesting:

python3 -c 'print("> " * 100000 + " a")' | ./run.sh

(but 90,000 is ok.)

It would be nice to make the library handle this exception gracefully.

matklad commented 1 year ago

Yeah, I don’t think this should be strictly specified: it doesn’t really matter if specific limits differ a bit between two impls. I would say that imposing some limit is recommended though: it doesn’t limit real usage, but could limit potential exploits. Would be happy with just permitting too :)

jgm / djot

stack overflows #100