odin-lang / Odin

Odin Programming Language
https://odin-lang.org
BSD 3-Clause "New" or "Revised" License
6.61k stars 577 forks source link

os2.process_start uses the global allocator instead of `context.allocator` which can lead to a panic/failed assert #4265

Closed gaultier closed 4 days ago

gaultier commented 4 days ago

Context

Where to find more information and get into contact when you encounter a bug:

    Website: https://odin-lang.org
    GitHub:  https://github.com/odin-lang/Odin/issues

Useful information to add to a bug report:

    Odin:    dev-2024-09:8814170ed
    OS:      Fedora Linux 40 (Workstation Edition), Linux 6.10.7-200.fc40.x86_64
    CPU:     11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
    RAM:     15659 MiB
    Backend: LLVM 18.1.6

Expected Behavior

When compiling with -default-to-nil-allocator and setting context.allocator and context.temp_allocator to custom allocators, I expect os2.process_start to use these. But it does not.

Current Behavior

os2.process_start accesses temp_allocator() meaning the default global temp allocator instead of using context.allocator or context.temp_allocator. With the build option -default-to-nil-allocator, that leads eventually to a panic or failed assert.

Additionally, that means that it is not visible from the signature of the function that it allocates and we cannot provide it a specific allocator just for one call e.g. os.process_start(desc, allocator=foobar).

Steps to Reproduce

Here is a program that sets context.allocator and context.temp_allocator to fixed arenas and tries to spawn a process (does not matter which one) with os2.process_start. It will end up in a failed assert because it tries to use the default general temp allocator to allocate some memory (to convert the executable name to a cstring). This general temp allocator cannot allocate any memory because its backing allocator (the general purpose default allocator) is nil.

package main

import "core:fmt"
import "core:mem"
import "core:mem/virtual"
import "core:os/os2"

main :: proc() {

    arena: virtual.Arena

    {
        arena_size := uint(8) * mem.Megabyte
        mmapped, err := virtual.reserve_and_commit(arena_size)
        if err != nil {
            panic("err")
        }
        if err = virtual.arena_init_buffer(&arena, mmapped); err != nil {
            panic("err")
        }
    }
    context.allocator = virtual.arena_allocator(&arena)

    tmp_arena: virtual.Arena
    {
        tmp_arena_size := uint(1) * mem.Megabyte
        tmp_mmapped, err := virtual.reserve_and_commit(tmp_arena_size)
        if err != nil {
            fmt.eprintln(err)
            panic("err")
        }
        if err = virtual.arena_init_buffer(&tmp_arena, tmp_mmapped); err != nil {
            fmt.eprintln(err)
            panic("err")
        }
    }
    context.temp_allocator = virtual.arena_allocator(&tmp_arena)

    desc := os2.Process_Desc {
        command = []string{"ls"},
    }

    process, err := os2.process_start(desc) // Assert failed!
    if err != nil {
        panic("err")
    }

    process_state, err2 := os2.process_wait(process)
    fmt.println(process_state, err2)
}

Failure Logs

Run:

$ odin run src/  -vet -strict-style -default-to-nil-allocator -debug
/home/pg/not-my-code/Odin/core/strings/builder.odin(299:2) runtime assertion: len(array) > 0
fish: Job 1, 'odin run src/  -vet -strict-sty…' terminated by signal SIGILL (Illegal instruction)

GDB:

(gdb) r
Starting program: /home/pg/scratch/odin-nil-allocator/src.bin 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
/home/pg/not-my-code/Odin/core/strings/builder.odin(299:2) runtime assertion: len(array) > 0

Program received signal SIGILL, Illegal instruction.
runtime.default_assertion_contextless_failure_proc (prefix=..., message=..., loc=...) at /home/pg/not-my-code/Odin/base/runtime/core.odin:811
811     trap()
(gdb) bt
#0  runtime.default_assertion_contextless_failure_proc (prefix=..., message=..., loc=...) at /home/pg/not-my-code/Odin/base/runtime/core.odin:811
#1  0x000000000043201d in runtime.default_assertion_failure_proc (prefix=..., message=..., loc=...) at /home/pg/not-my-code/Odin/base/runtime/core.odin:782
#2  0x000000000044d24a in runtime.assert.internal-0 (message=..., loc=...) at /home/pg/not-my-code/Odin/base/runtime/core_builtin.odin:928
#3  0x000000000043ae3a in runtime.assert (condition=false, message=..., loc=...) at /home/pg/not-my-code/Odin/base/runtime/core_builtin.odin:930
#4  0x000000000041d330 in runtime.pop-25293 (array=0x7fffffffdc60, loc=...) at /home/pg/not-my-code/Odin/base/runtime/core_builtin.odin:115
#5  0x000000000041d2cd in strings.to_cstring (b=0x7fffffffdc60) at /home/pg/not-my-code/Odin/core/strings/builder.odin:299
#6  0x000000000041f4d4 in os2._process_start-1575 (desc=...) at /home/pg/not-my-code/Odin/core/os/os2/process_linux.odin:446
#7  0x0000000000422a1b in os2.process_start (desc=...) at /home/pg/not-my-code/Odin/core/os/os2/process.odin:347
#8  0x0000000000403549 in main.main () at main.odin:43
#9  0x0000000000409065 in main (argc=1, argv=0x7fffffffe438) at /home/pg/not-my-code/Odin/base/runtime/entry_unix.odin:57

Suggestion: A allocator should be present as one of the arguments of os2.process_start and be used inside it.

flysand7 commented 4 days ago

I'm not sure how -default-to-nil-allocators is supposed to work, so can't tell enough about the bug. Seems like os2 is implemented correctly though.

The crash is happening on this line of code:

https://github.com/odin-lang/Odin/blob/8814170edff5c7c70d17d5edf789bc478311d906/core/os/os2/process_linux.odin#L446

For now figuring out what exactly is broken, and whether it's broken at all. I thought that maybe my core:mem PR was breaking it, but checking out the commit before this PR was merged still reproe'd the bug.

When compiling with -default-to-nil-allocator and setting context.allocator and context.temp_allocator to custom allocators, I expect os2.process_start to use these. But it does not.

Yes, it shouldn't. os2's temp_allocator() is supposed to not be affected by context, that's on purpose. The idea is that if it comes from the context it can break (because we have to handle all allocator errors from user context), but if have a separate allocator just for the things that are alive during the oscall procedure, we don't have to handle allocator errors as much.

Additionally, that means that it is not visible from the signature of the function that it allocates and we cannot provide it a specific allocator just for one call e.g. os.process_start(desc, allocator=foobar).

This is also expected, because process_start(), actually does not allocate any memory that you have to free in a later call. Everything that this function allocates is deallocated within the same function, so it's all temporary allocations to get the right strings to supply to the OS.

flysand7 commented 4 days ago

Running hypothesis is that -default-to-nil-allocator somehow makes os2.temp_allocator() a nil allocator.

jasonKercher commented 4 days ago

I believe context.temp_allocator works despite this because of the when block at the top of base/runtime/default_temporary_allocator.odin. However, the user can always overwrite that. This is not the case for the os2's temp_allocator.

E: -default-to-nil-allocator causes runtime.default_allocator to be the nil_allocator. The os2 temp_allocator uses runtime.arena_allocator_proc. When that requires more memory it requests it from runtime.default_allocator.

gaultier commented 3 days ago

Thank you for the prompt response. The fix should work, I’ll try it when I get back to a computer. (EDIT: I tried, Confirmed to work).

However, one disadvantage to this approach is that if someone wants to prevent any allocations using the default allocators, meaning they want to control how every and any allocation is done (for example on a tiny ARM 32 device), then the current code paired with the option “-default-to-nil-allocator” is misleading, this goal won’t be achieved. Behind the scenes, the default allocators are still being used silently sometimes.

Now I don’t know if this goal is supported by Odin, or if the docs should clarify what this command line option really does?