modularml / mojo

The Mojo Programming Language
https://docs.modular.com/mojo/manual/
Other
23.3k stars 2.59k forks source link

[BUG]: autotune search causes core dump after successful compile #1047

Closed mikowals closed 9 months ago

mikowals commented 1 year ago

Bug description

Calling an evaluator function with search leads to Segmentation fault (core dumped) whenever the code compiles and autotuning is done. Any first run after a change causes the error. All output is printed as if the code compiles and executes successfully before the error. On subsequent runs with no code changes the evaluator does not run so no error is produced. So I think the compile is completed successfully before the error.

This core dump also occurs if the code in the matmul demo is copied to a file and run from the command line by adding a main function. Similarly the reproducer below can run in the playground without error, if the lines in main are moved to the top level. I have attached the relevant code from the matmul example in a file. Used a .txt suffix because Github didn't like .mojo. matmul_demo_bug.txt

Steps to reproduce

from autotune import autotune, search
from benchmark import Benchmark
from algorithm import parallelize
from time import now

alias T = DType.float32
alias Ptr = DTypePointer[T]
alias NUM_ELEMENTS = 1
alias add_fn_sig_type = fn(Ptr) -> None

@adaptive
fn add_to_tune(c: Ptr, /):
    alias workers = autotune(4, 1)
    @always_inline
    @parameter
    fn loop(ii: Int):
        c.store(ii, c.load(ii))

    parallelize[loop](NUM_ELEMENTS, workers)

fn add_evaluator(funcs: Pointer[add_fn_sig_type], size: Int) -> Int:
    let a = Ptr().alloc(NUM_ELEMENTS)
    a.store(0, 1)

    var best_idx: Int = -1
    var best_time: Int = -1
    for ii in range(size):
        let func = funcs.load(ii)

        @parameter
        fn wrapper():
            func(a)

        let cur_time = Benchmark().run[wrapper]()
        if best_time < 0 or cur_time < best_time:
            best_time = cur_time
            best_idx = ii

    print("Best candidate idx:", best_idx)
    return best_idx

fn add_autotune(a: Ptr):
    alias best_fn: add_fn_sig_type
    search[
        add_fn_sig_type,
        VariadicList(add_to_tune.__adaptive_set),
        add_evaluator -> best_fn,
    ]()

    return best_fn(a)

fn main():
    let a = Ptr().alloc(NUM_ELEMENTS)
    add_autotune(a)
    print("success")

Produces:

Best candidate idx: 1
success
[1084507:1084507:20231012,101011.135829:ERROR file_io_posix.cc:144] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq: No such file or directory (2)
[1084507:1084507:20231012,101011.135886:ERROR file_io_posix.cc:144] open /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq: No such file or directory (2)
Please submit a bug report to https://github.com/modularml/mojo/issues and include the crash backtrace along with all the relevant source codes.
Stack dump:
[1084507:1084508:20231012,101011.136701:ERROR directory_reader_posix.cc:42] opendir /home/mikowals/.modular/crashdb/attachments/f3e419d4-f586-4778-9cf5-5a779f9bec1c: No such file or directory (2)
0.      Program arguments: mojo autotune_bug2.mojo
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  mojo      0x0000559e4ed3afd7
1  mojo      0x0000559e4ed38bae
2  mojo      0x0000559e4ed3b6af
3  libc.so.6 0x00007f3886a3c460
4  libc.so.6 0x00007f381c0000f0
Segmentation fault (core dumped)
mikowals@mojo-v0:~/projects/llama2.mojo$ [1084507:1084508:20231012,101012.086886:ERROR http_transport_libcurl.cc:483] HTTP status 404

running it again without changing the code, produces:

success

Adding a function that needs no autotuning and using the same evaluator function, also works fine. Running this:

 # omitted code duplicated above

fn add(a: Ptr, /):
    alias workers = 4
    @parameter
    fn loop(ii: Int):
        a.store(ii, a.load(ii))

    parallelize[loop](NUM_ELEMENTS, workers)

fn main():
    let funcs = Pointer[add_fn_sig_type].alloc(NUM_ELEMENTS)
    funcs.store(0, add)
    _ = add_evaluator(funcs, 1)
    print("success")

produces:

Best candidate idx: 0
success

I also tried using clobber_memory, keep, _ = a and various things in the evaluator function since that was done in the matmul demo but it had no impact. And ultimately I found the same bug running the matmul demo from the command line so I don't think it has to do with the compiler removing variables early.

System information

- What OS did you do install Mojo on ?
Linux mojo-v0 6.2.0-34-generic #34-Ubuntu SMP PREEMPT_DYNAMIC Mon Sep  4 13:06:55 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
- Provide version information for Mojo by pasting the output of `mojo -v`
mojo 0.4.0 (9e33b013)
- Provide Modular CLI version by pasting the output of `modular -v`
modular 0.2.0 (355ea9c0)
Mogball commented 1 year ago

Hey @mikowals, thanks for filing! It seems this is a symptom of another issue we've encountered recently. We're on it.

mikowals commented 9 months ago

Closing since autotune has been removed for replacement.