Closed anuraaga closed 1 year ago
Digging through LLVM docs, I did come to understand that because get and delete create a temporarly alloca, it is conceivable for those to be different and then different addresses (though that would be an unfortunate deoptimization if copying the key struct from execution stack to c stack twice). I guess there is something about the real-world code that is causing the contents of the allocas to be different, which I still wasn't able to repro in the simpler example...
For context, changing the map value type to pointer solved the issue, though is not ideal since we didn't want to heap allocate for this cache
https://github.com/corazawaf/coraza/pull/554
This is because it seems like the problematic behavior only happens with larger value size. Or perhaps just coincidence / luck
I've done a lot of work on the hashing code and the hash maps. I'll take a look at this.
Hmm... can you come up with a reproducer based on the mapbench code that doesn't require patching the runtime to demonstrate it? (I've patched my runtime but it's much easier if we have something that can if if ... { panic }
@dgryski Do you need a small amount of code or a small command that reliably reproduces? I can send the latter easily based on coraza-proxy-wasm
if it will help, if the former I'll try some more and see what I can do
Both are nice. The small amount of code is better but the small command is useful until that arrives.
Thanks @dgryski.
You'll need your TinyGo built with #3302 and docker available. ftw
will fail after a few requests
git clone https://github.com/anuraaga/coraza-proxy-wasm.git --branch repro-mapdelete
cd coraza-proxy-wasm
LOCAL_TINYGO=1 go run mage.go build
go run mage.go ftw
tail -n 20 build/ftw-envoy.log
Smaller reproducer:
~/go/src/github.com/dgryski/bug $ cat bug.go
package main
type V struct {
buf [256]byte
s1 string
s2 string
s3 string
}
func main() {
m := make(map[int]V)
m[0] = V{s1: "hello", s2: ",", s3: "world"}
println(m[0].s1)
delete(m, 0)
println(m[0].s1)
}
~/go/src/github.com/dgryski/bug $ go run bug.go
hello
~/go/src/github.com/dgryski/bug $ tinygo run bug.go
hello
hello
Note that if the buf
struct is after the strings, then the bug is not present.
Oh duh. The hashmap code limits value size to uint8
for some reason: https://github.com/tinygo-org/tinygo/blob/release/src/runtime/hashmap.go#L19
That's an easy enough fix.
Annoyingly while this is a bug I'm not certain this actually related to the issue at hand, given that the values in the test program are much smaller.
I wonder if this is the same memory corruption from the garbage collector missing pointers that we've been seeing elsewhere?
I'm having real trouble reproducing this on stock tinygo and wasmtime. :(
@anuraaga it would be very helpful if you could reproduce this with stock TinyGo. If you can't, I have to assume it's related to the custom GC somehow.
Some things that could help:
-tags=runtime_asserts
to do a bit more checkingrunGC()
inside runtime.alloc
to make the chance of hitting this bug higher (this is very slow), something like this: gcTotalAlloc += uint64(size)
gcMallocs++
+ runGC()
+
neededBlocks := (size + (bytesPerBlock - 1)) / bytesPerBlock
// Continue looping until a run of free blocks has been found that fits the
I'll see if I can figure something out. One thing, from the original message
The key address for get is 0x0000ffd0 and delete is 0x0000ffb8. That being said the hash is the same in this example so it's being able to clear the map
Is this expected? This is with TinyGo dev
, no custom GC - while the map is getting cleared in this case still, the addresses being different seems unexpected to me.
Yes. because the unsafe.Pointer
is a pointer to the value struct of the key, and that particular value struct is copied from the range iterator. It's not unexpected to me that those two keys have different pointer values.
Thanks - I guess each operation is copying the key to the stack and its unfortunate those don't get coallesced but agree that's probably just a lack of optimization and not likely a bug (but still not 100% sure on that).
Anyways, while it's still not a small code example, I updated the same branch as https://github.com/tinygo-org/tinygo/issues/3358#issuecomment-1371717230 to unplug the custom GC and malloc, now it'll fail with just stock dev
TinyGo.
The key address for get is 0x0000ffd0 and delete is 0x0000ffb8. That being said the hash is the same in this example so it's being able to clear the map
Is this expected? This is with TinyGo
dev
, no custom GC - while the map is getting cleared in this case still, the addresses being different seems unexpected to me.
It's a stack address, so yeah there is a good chance it is different each time depending on what optimizations happen or don't happen. The address itself is not relevant to the hashmap though: it only looks at the value stored at the address.
@anuraaga Have you gotten the mapbench
to fail on wasmtime or wazero rather than as part of the waf?
@dgryski Unfortunately continuing tweaks to make mapbench look more like the WAF code has not repro'd the bug.
However, that reminds me that the WAF code itself can actually be run with wazero. So instead of go run mage.go ftw
, you can just use go run mage.go test
to get the failure with wazero - while wazero doesn't seem to be printing the panic string message, the stack trace is the transormation cache not empty line. I guess this shows it's not an Envoy-specific issue.
I know a smaller code example is important but not sure what else to try to get it to repro in mapbench.
I do still want to challenge the stack address observations once more. The code is this
for k := range m {
if _, ok := m[k]; !ok {
println("couldn't find key")
} else {
println("could find key")
}
delete(m, k)
if _, ok := m[k]; !ok {
println("couldn't find key")
} else {
println("could find key")
}
}
There is just one stack frame and one local variable k
. Yes, it can result in a different stack address for those two allocas, but do we really expect it? It seems like a really simple optimization being missed if so and I can't shake the feeling that the fact that those addresses are different is related to the underlying bug. This wasn't happening with a smaller struct size, just a larger one, which would not seem to have a semantic impact on those allocas.
--- FAIL: TestLifecycle (0.41s)
--- FAIL: TestLifecycle/wasm (0.41s)
--- FAIL: TestLifecycle/wasm/request_body_accepted (0.02s)
panic: wasm error: unreachable
wasm stack trace:
.runtime._panic(i32,i32)
0x5a9a2: /Users/anuraag/git/tinygo/src/runtime/runtime_tinygowasm.go:72:6 (inlined)
/Users/anuraag/git/tinygo/src/runtime/panic.go:52:7 (inlined)
/Users/anuraag/git/tinygo/src/runtime/gc_conservative.go:309:15 (inlined)
/Users/anuraag/git/tinygo/src/runtime/gc_conservative.go:310:15 (inlined)
/Users/anuraag/git/tinygo/src/runtime/gc_conservative.go:309:15 (inlined)
/Users/anuraag/git/tinygo/src/runtime/gc_conservative.go:310:15 (inlined)
/Users/anuraag/git/tinygo/src/runtime/gc_conservative.go:309:15 (inlined)
/Users/anuraag/git/tinygo/src/runtime/gc_conservative.go:310:15 (inlined)
/Users/anuraag/git/tinygo/src/runtime/gc_conservative.go:309:15 (inlined)
/Users/anuraag/git/tinygo/src/runtime/algorithm.go:363:16 (inlined)
/Users/anuraag/git/tinygo/src/runtime/algorithm.go:839:22 (inlined)
/Users/anuraag/git/tinygo/src/runtime/gc_conservative.go:223:27 (inlined)
/Users/anuraag/git/tinygo/src/runtime/arch_tinygowasm_malloc.go:529:34
.(*github.com/corazawaf/coraza/v3/internal/corazawaf.RuleGroup).Eval(i32,i32)
0x1266de: /Users/anuraag/go/pkg/mod/github.com/anuraaga/coraza/v3@v3.0.0-20230105023732-6f71d29392f3/internal/corazawaf/rulegroup.go:108:8 (inlined)
<Go interface method> (inlined)
/Users/anuraag/go/pkg/mod/github.com/petar-dambovaliev/aho-corasick@v0.0.0-20211021192214-5ab2d9280aa9/ahocorasick.go:131:34 (inlined)
/Users/anuraag/go/pkg/mod/github.com/wasilibs/go-re2@v0.1.1-0.20230112024855-2994a24f58d7/re2.go:681:13 (inlined)
/Users/anuraag/go/pkg/mod/github.com/wasilibs/go-re2@v0.1.1-0.20230112024855-2994a24f58d7/re2.go:677:11 (inlined)
/Users/anuraag/go/pkg/mod/github.com/wasilibs/go-re2@v0.1.1-0.20230112024855-2994a24f58d7/re2.go:646:17 (inlined)
I'll double check the code we're generating for the map operations.
When generating map operations, the LLVM code looks like
// key can be compared with runtime.memequal
// Store the key in an alloca, in the entry block to avoid dynamic stack
// growth.
mapKeyAlloca, mapKeyPtr, mapKeySize := b.createTemporaryAlloca(key.Type(), "hashmap.key")
b.CreateStore(key, mapKeyAlloca)
// Fetch the value from the hashmap.
params := []llvm.Value{m, mapKeyPtr, mapValuePtr, mapValueSize}
commaOkValue = b.createRuntimeCall("hashmapBinaryGet", params, "")
b.emitLifetimeEnd(mapKeyPtr, mapKeySize)
So, because the get/delete/set operations take a pointer to the key, we create an alloca and store the key into it. If llvm doesn't otherwise coalesce that store with the fact that the key value is already a stack value, then that probably is an optimization opportunity for us. However, while the extra key copy is a performance hit I don't think it's the source of this bug (which I have yet to replicate...)
Edit: The punchline here of course is that the extra copy was the source of this bug...
Got tinygo runtime logging enabled with
diff --git a/proxywasm/proxytest/wasmwrapper.go b/proxywasm/proxytest/wasmwrapper.go
index 8a515cb..38e6ce5 100644
--- a/proxywasm/proxytest/wasmwrapper.go
+++ b/proxywasm/proxytest/wasmwrapper.go
@@ -17,6 +17,7 @@ package proxytest
import (
"context"
"io"
+ "os"
"reflect"
"unsafe"
@@ -84,7 +85,8 @@ type vmContext struct {
// Note: Currently only HTTP plugins are supported.
func NewWasmVMContext(wasm []byte) (WasmVMContext, error) {
ctx := context.Background()
- r := wazero.NewRuntime(ctx)
+ wazeroconfig := wazero.NewModuleConfig().WithStdout(os.Stderr).WithStderr(os.Stderr)
+ r := wazero.NewRuntimeWithConfig(ctx, wazero.NewRuntimeConfig().WithDebugInfoEnabled(true))
_, err := wasi_snapshot_preview1.Instantiate(ctx, r)
if err != nil {
@@ -96,7 +98,12 @@ func NewWasmVMContext(wasm []byte) (WasmVMContext, error) {
return nil, err
}
- mod, err := r.InstantiateModuleFromBinary(ctx, wasm)
+ compiled, err := r.CompileModule(ctx, wasm)
+ if err != nil {
+ return nil, err
+ }
+
+ mod, err := r.InstantiateModule(ctx, compiled, wazeroconfig)
if err != nil {
return nil, err
}
and of course a go.mod
line to point to my local copy
replace github.com/tetratelabs/proxy-wasm-go-sdk => ../proxy-wasm-go-sdk
The issue has to do with struct padding.
The fields for the key struct look like:
type transformationKey struct {
argKey uintptr
argIndex int
argVariable variables.RuleVariable
transformationsID int
}
The argVariable
member is a uint8
, meaning there are padding bytes before the transformationsID
.
Rearranging the fields like this passes:
type transformationKey struct {
argKey uintptr
argIndex int
transformationsID int
argVariable variables.RuleVariable
}
Explicitly account for those bytes with _ [3]byte
after the argVariable
also solves it. Ok, time to dig deeper.
... oh no. This looks like a rather gnarly bug.
One fix would be to:
But that's probably not the most efficient way to do this. There's a good chance there are other places with a similar bug.
"proof" via hexudmps:
hexdump of key in loop before first map lookup:
00000000: 00 00 00 00 00 00 00 00 25 00 00 00 01 00 00 00 |........%.......|
binary get 0x00178ea0 0x0000fd00 16 2046442687
hexdump of key in "get"
00000000: 00 00 00 00 00 00 00 00 25 9c 1c 00 01 00 00 00 |........%.......|
hexdump of key in loop before delete
00000000: 00 00 00 00 00 00 00 00 25 00 00 00 01 00 00 00 |........%.......|
binary del 0x00178ea0 0x0000fd60 16 45773010
hexdump of key in delete:
00000000: 00 00 00 00 00 00 00 00 25 00 00 00 01 00 00 00 |........%.......|
binary get 0x00178ea0 0x0000fd00 16 2046442687
hexdump of key in get after delete:
00000000: 00 00 00 00 00 00 00 00 25 9c 1c 00 01 00 00 00 |........%.......|
So, again looks like memory corruption so it's tricky to replicate exactly once I add the debug logging, but in any event we can see that the 3 extra bytes 9c 1c 00
are "invading" the struct and throwing off the hash calculation and the equality test.
It seems like there's going to be a bunch of places where we ask LLVM to "store" a value, and it copied over the bytes that are "live", but really we'd prefer to have just a straight memcpy
.
Thanks for the investigation, looks promising!
For reference found this issue in rust. Didn't read it all but the idea of synthetic padding fields generated by TinyGo for structs seems sort of ok
https://github.com/rust-lang/rust/issues/70230
I don't know if an alloca store would ever be conceptually a memcpy - the fields of the struct could just be in individual slots on the execution stack in wasm, or registers in a cpu, and would be individually copied onto the stack I guess. I guess that means that synthetic padding fields have the downside of likely using more execution stack / register.
Otherwise probably need to memzero after each alloca.
Yeah, I think a memzero is probably the best thing to do here. LLVM should be able to optimize it away when it isn't necessary, I believe.
Do we need those memsets every time we do stores for structs with padding or just for map keys? Can we get away with just doing it the first time we create the value? Can we tag the hasher functions somehow so that llvm knows the padding needs to be cleared?
I suppose it is just about map keys / general hashing - in that case, I wonder if fixing #3354's optimization issue with the reflection-based key computation and always using it instead of the memory-based hash is an alternative solution. It would be similar to Go's hash which always iterates over the fields
https://github.com/golang/go/blob/c22a6c3b906cd37616d76da5f504c4c3e5677d94/src/runtime/alg.go#L179
Ideally the fields could be passed directly (like Go's structtype
) instead of indirectly through reflection, but I guess that would be too big of a change.
So I looked at the LangRef and we may have a bit of a problem here:
The contents of memory are updated to contain
<value>
at the location specified by the<pointer>
operand. If<value>
is of scalar type then the number of bytes written does not exceed the minimum number of bytes needed to hold all bits of the type. For example, storing an i24 writes at most three bytes. When writing a value of a type like i20 with a size that is not an integral number of bytes, it is unspecified what happens to the extra bits that do not belong to the type, but they will typically be overwritten. If<value>
is of aggregate type, padding is filled with undef. If<pointer>
is not a well-defined value, the behavior is undefined.
(Emphasis mine). Structs are one of the aggregate types. What this says is that the padding bytes in a struct are undefined when stored. So even if the memory is cleared before the store, LLVM is allowed to fill the padding again with garbage (it probably won't, but it is allowed to). One way around this is to split up the store manually in IR and explicitly zero the struct padding. This is a bit tricky to get right but certainly not impossible. It probably won't hurt optimizations significantly as LLVM itself also has a pass that splits up large stores in a similar way.
Do we need those memsets every time we do stores for structs with padding or just for map keys?
No, only when the alloca itself is used as raw memory (like when it is hashed etc). Normal loads/stores (like *ssa.Alloc
) should be fine.
Can we tag the hasher functions somehow so that llvm knows the padding needs to be cleared?
I'm not aware of anything like that.
I wonder if fixing #3354's optimization issue with the reflection-based key computation and always using it instead of the memory-based hash is an alternative solution.
We could, but it is always going to be slower than hashing the memory directly. Remember, reflect is slow (especially in TinyGo). It does give a quick fix though: either temporarily do all hashing of structs via reflect, or somehow detect padding bytes and use the reflect based hashing for structs with padding bytes. One relatively easy way to detect padding bytes is to sum up all the sizes of the struct fields (recursively) and compare it to the size LLVM calculates for it.
I'm going to start on a "zero-the-undef-bytes-for-hashing" patch and see what that looks like.
Fix for this is in dev.
I am currently debugging this code that tries to delete the keys in a map (to empty it while retaining bucket storage)
https://github.com/corazawaf/coraza/blob/v3/dev/internal/corazawaf/rulegroup.go#L105
(for the below, note that I tried copying the keys into a slice before deleting to see if it was iterator+delete behavior problem but it was all the same so simplified back to normal map iteration).
I noticed a memory leak and after adding
if len(transformationCache) > 0 { panic }
after the deletion could verify that the map wasn't getting cleared with tinygo wasm. It works fine with Go.I have added this instrumentation to hashmap.go
https://github.com/anuraaga/tinygo/compare/custom-gc...anuraaga:tinygo:hashmapdelete-debug?expand=1
And shrunk down the test to similar one as last time
https://github.com/corazawaf/coraza-proxy-wasm/compare/main...anuraaga:coraza-1225?expand=1#diff-173fbfd8d8844658344b121461b4290d0a85230caae9825240705df8130e8b75R33
~/go/bin/tinygo build -scheduler=none -target=wasi -opt=2 -o mapbench.wasm
The debug output looks something like this
The key address for get is
0x0000ffd0
and delete is0x0000ffb8
. That being said the hash is the same in this example so it's being able to clear the map but with the same instrumentation when looking at the output for the original coraza code, the hash values were also different. I'm not sure why I wasn't able to reproduce this hash value difference, but either way, the key is in a local variablek
, which there is only one of, so intuitively I suppose those addresses must be the same and the difference is unexpected.One weird thing I found while making the repro is it seems the value struct needs to be more than 2 strings worth of size to exhibit the behavior - with three fields, get and delete have different addresses, while with two fields they are the same.
Looked through the code in
hashmap.go
andmap.go
(compiler) and couldn't see anything suspicious, the code paths for get/lookup vs delete look basically the same for both, but the difference does cause our real-world code to fail with the map not being cleared. With the code looking the same, the issue may be in IR reduction?Note the above simpleish test case approach is also applied to the real world code here (which is where I was observing the address+key value difference)
https://github.com/corazawaf/coraza/compare/v3/dev...anuraaga:scratch?expand=1
The output looks like this (we can see the different hash values)