nodejs / llparse

Generating parsers in LLVM IR
http://llparse.org
Other
584 stars 30 forks source link

prevent out-of-bounds reads when using SSE 4.2. #37

Closed jsteemann closed 4 years ago

jsteemann commented 4 years ago

in the SSE4.2 case, ensure the generated static strings are at least ${blob.alignment} characters long. the SSE4.2 _mm* instructions require data not only to be 16-byte-aligned but also may read full 16 bytes from the start address. so far, there is no guarantee that the static strings used for comparisons are actually in a memory area from which 16 bytes can safely be read. this is up to the compiler.

motivation for the bugfix: when running an application that is using llhttp, we are seeing address sanitizer out-of-bounds reads when loading the static comparison strings into the _mm registers.

this PR is an attempt to fix this, by ensuring that the generated compare strings are at least 16 bytes long in memory, so that this memory can safely be read into an _mm register.

caveat: I have just patched the generator code to the best of my knowledge, but didn't run it yet. but the current code without the padding definitely has undefined behavior.

indutny commented 4 years ago

Oh, good catch! I hadn't considered that it could be a UB. My initial thoughts were that it could be a side-channel timing issue, but all tests that I did on my CPU did not reveal any timing differences.

Do you think it would make more sense to patch it at the place of creation of the blob: https://github.com/nodejs/llparse/blob/master/src/implementation/c/node/table-lookup.ts#L126 and pad subRanges to make sure that it is at least 128 bits long.

Thanks for the patch!

jsteemann commented 4 years ago

The problem with this is that, even if less than 128 bits of data are available, it may just work fine. It depends on how/where the compiler puts the static strings into the compiled binary and if there is enough valid (read-only) memory behind these string values. In practice it will just work most of the time. However, if you start using a memory checker such as Address sanitizer (ASan), it will be a lot stricter and instantly report invalid memory access errors. I think it is good to fix these to make the program more portable and reliable.

I have no idea where exactly that fix should be made in the generator code. I am completely new to it and have spent only a few minutes looking at it :see_no_evil: . Therefore I would like to leave this decision up to you. As long as the generated code makes sure that it is safe to read the next 128 bits of memory starting at one of the static strings, it will be just fine for me. So please do whatever you think is best :smile:

Thanks!

indutny commented 4 years ago

Ohhh, I didn't think of it. Thank you for taking even more time to explain it to me!

I have opened a PR with a suggested alternative fix: https://github.com/nodejs/llparse/pull/38/files . Would really appreciate if you could take a look at it!

Thanks again!

jsteemann commented 4 years ago

Superseded by alternative implementation #38