odin-lang / Odin

Odin Programming Language
https://odin-lang.org
BSD 3-Clause "New" or "Revised" License
6.77k stars 588 forks source link

JSON parser memory corruption with scratch allocator #2694

Open Beefster09 opened 1 year ago

Beefster09 commented 1 year ago

Context

    Odin: dev-2023-08:9453b238
    OS:   Linux Mint 20.3, Linux 5.4.0-155-generic
    CPU:  Intel(R) Core(TM) i3-1005G1 CPU @ 1.20GHz
    RAM:  7753 MiB

Expected Behavior

There should be no memory corruption / segfault

Current Behavior

Certain json inputs of long arrays of objects result in corrupted memory and segfaults when using the scratch allocator

Failure Information (for bugs)

This memory corruption does not occur with other temporary allocators. My program no longer crashed upon switching from a scratch allocator to an arena allocator.

More than likely this is an issue with the allocator rather than the json parser, however json parsing was what surfaced the bug, so there may be an issue with json parsing as well.

The threshold seems to be json arrays of 16 json objects or more (presumably this triggers a reallocation of the Array)

Steps to Reproduce

Minimal program to reproduce:

package bug

import "core:fmt"
import "core:encoding/json"
import "core:mem"

main :: proc() {
    scratch := new(mem.Scratch_Allocator)
    defer free(scratch)
    if mem.scratch_allocator_init(scratch, 4 * mem.Megabyte) != .None {
        panic("unable to initialize scratch allocator")
    }
    defer mem.scratch_allocator_destroy(scratch)

    into: struct {
        foo: int,
        bar: string,
        things: []json.Object,
    }

    err := json.unmarshal_string(SAMPLE_JSON, &into, allocator = mem.scratch_allocator(scratch))
    fmt.println(err)
    fmt.println(into.things[0])
    fmt.println(into.things[1])
    fmt.println(len(into.things[2]))
    fmt.println(len(into.things[2]["longlist"].(json.Array)))
    fmt.println(into.things[2]) // this will segfault
    fmt.println(into)
}

SAMPLE_JSON :: `
{
    "foo": 123,
    "things": [
        {
            "a": "c",
            "numberlist": [1, 2, 3],
            "asdf": "qwerty"
        },
        {
            "a": "b",
            "shortlist": [
                {
                    "type": "blob",
                    "x": 123,
                    "y": 321
                }
            ],
            "c": "asdf"
        },
        {
            "a": "d",
            "longlist": [
                {
                    "type": "skeleton",
                    "x": 234,
                    "y": 432
                },
                {
                    "type": "skeleton",
                    "x": 5632,
                    "y": 4875
                },
                {
                    "type": "skeleton",
                    "x": 574,
                    "y": 585
                },
                {
                    "type": "skeleton",
                    "x": 574,
                    "y": 585
                },
                {
                    "type": "skeleton",
                    "x": 574,
                    "y": 585
                },
                {
                    "type": "skeleton",
                    "x": 574,
                    "y": 585
                },
                {
                    "type": "skeleton",
                    "x": 574,
                    "y": 585
                },
                {
                    "type": "skeleton",
                    "x": 574,
                    "y": 585
                },
                {
                    "type": "skeleton",
                    "x": 574,
                    "y": 585
                },
                {
                    "type": "skeleton",
                    "x": 574,
                    "y": 585
                },
                {
                    "type": "skeleton",
                    "x": 574,
                    "y": 585
                },
                {
                    "type": "skeleton",
                    "x": 574,
                    "y": 585
                },
                {
                    "type": "skeleton",
                    "x": 574,
                    "y": 585
                },
                {
                    "type": "skeleton",
                    "x": 574,
                    "y": 585
                },
                {
                    "type": "skeleton",
                    "x": 574,
                    "y": 585
                },
                {
                    "type": "skeleton",
                    "x": 574,
                    "y": 585
                }
            ],
            "asdf": "foo"
        }
    ],
    "bar": "baz"
}
`

Failure Logs

output of above program:

nil
map[a="c", numberlist=[1, 2, 3], asdf="qwerty"]
map[c="asdf", shortlist=[map[x=123, type="blob", y=321]], a="b"]
3
16
Segmentation fault (core dumped)
Kelimion commented 1 year ago

Confirmed on Windows 10. It also occurs with a Scratch size of 48 MiB, no longer occurs at 64 MiB. image

Beefster09 commented 1 year ago

It's important to note that the exact result of the corruption varies. Most of the time, it results in a segfault, but occasionally it results in a buffer overrun where damn near everything in memory gets printed. It's possible that increasing scratch memory size only masked the really bad stuff from happening and it would happen again with a larger (perhaps 32 items, such that it triggers another reallocation) json array.