ugorji / go

idiomatic codec and rpc lib for msgpack, cbor, json, etc. msgpack.org[Go]
MIT License
1.87k stars 295 forks source link

msgpack: OOM panic from malicious input with very large pre-fixed array length #90

Closed teepark closed 9 years ago

teepark commented 9 years ago

This program panics in codec, illustrating the problem:

package main

import (
    "fmt"

    "github.com/ugorji/go/codec"
)

var mh codec.MsgpackHandle

func main() {
    var container interface{}
    codec.NewDecoderBytes([]byte("\xdd~~~~"), &mh).Decode(&container)
    fmt.Println(container)
}

The \xdd prefix indicates an array32 type, and the 0000 puts its size at ~2 billion items. Nothing follows, but it tries to allocate the huge array anyway.

msgpack-python has dealt with the issue by creating max_<type>_len options which could probably be added to MsgpackHandle.

Another option would be a simple sanity check for every 4 byte length prefixed type (str32, bin32, array32, map32, ext32): whatever the length prefix says the length is, the following data must be at least that long in bytes (double in the case of map32). This is a simple invariant that can be quickly checked, must hold since the shortest type has one byte, and would deal with the problem without increasing the API surface.

If you're OK with this approach I could write it and submit a PR.

ugorji commented 9 years ago

The MaxLen option is the only one that can work.

The other one can't, because we can't know how many more "elements" will actually be in the container, if it is an io.Reader, etc. Even when decoding from a []byte, we still can't tell the number of elements, as an element may actually be expressed in many bytes. Consequently, that sanity check feels hacky, and doesn't solve the problem completely.

The issue with the MaxLen option, is that it has to be in the Decoder, not the MsgpackHandle, because the overall framework handles things like length, creating the container, etc. It just delegates "specific" functionality to the codec-specific handles (e.g. MsgpackHandle).

Also, the package at large now has to ensure that changes are propagated through the reflection path, fast-paths for slices/maps, and codecgen (code-generation) of specific types. After making the changes, some code must be re-generated also. Unfortunately, It's not a quick fix anymore, now that we are chasing performance by avoiding reflection where possible.

Let me think about this some more. If you are feeling adventurous, go for the PR.

teepark commented 9 years ago

What about only being willing to pre-allocate up to a specific size, append()ing (or setting further map items) after that?

I haven't dug through the code too deeply, but I have to imagine JSON is being handled that way, not being length-prefixed the way msgpack is.

ugorji commented 9 years ago

That may work. Let me think about it some more. I probably won't get to it for about a week due to offsite demanding work schedule.

ugorji commented 9 years ago

I think I have this fixed now. I will update within the next couple of days.

ugorji commented 9 years ago

Fixed by 45ce7596ace4534e47b69051a92aef7b64ec7b3f