rust-lang / hashbrown

Rust port of Google's SwissTable hash map
https://rust-lang.github.io/hashbrown
Apache License 2.0
2.46k stars 289 forks source link

Panic using the bumpalo BumpWrapper #260

Open folkertdev opened 3 years ago

folkertdev commented 3 years ago

First, I'm not sure if this should (also) go into the bumpalo issue tracker. Let me know.

I'm observing various issues with this type

pub type BuildHasher = BuildHasherDefault<WyHash>;
pub type BumpMap<'a, K, V> = hashbrown::HashMap<K, V, BuildHasher, hashbrown::BumpWrapper<'a>>;

and these are created like so:

fn new_in(arena: &'a bumpalo::Bump) -> Self {
    hashbrown::HashMap::with_hasher_in(default_hasher(), hashbrown::BumpWrapper(arena))
}

fn with_capacity_in(capacity: usize, arena: &'a bumpalo::Bump) -> Self {
    hashbrown::HashMap::with_capacity_and_hasher_in(
        capacity,
        default_hasher(),
        hashbrown::BumpWrapper(arena),
    )
}

Dependencies are

wyhash = "0.3"
bumpalo = { version = "3.6.1", features = ["collections"] }
hashbrown = { version = "0.11.2", features = [ "bumpalo" ] }

This mostly works, but for some examples panics in the Drop instance of HashMap, and sometimes just outright segfaults. Here is a backtrace.

It's hard to track down the cause exactly, because it does not happen consistently. I think the problem is in the (use of) the allocator, which means that only when the allocator is moved (enough times) does this problem occur. For instance what might happen is that on some runs a realloc can extend the current bump arena, and on other runs it cannot.

I'm hoping the backtrace is enough information to figure out some possible causes. I'm happy to then explore those.

Amanieu commented 3 years ago

By any chance are you dropping the Bump before the hashmap is dropped? This shouldn't be possible but it seems to be happening in roc_mono::ir::specialize_all. Can you provide the code for that function?

folkertdev commented 3 years ago

I can, but I'm very sure the Bump is not dropped. Here is the source

pub fn specialize_all<'a>(
    env: &mut Env<'a, '_>,
    mut procs: Procs<'a>,
    layout_cache: &mut LayoutCache<'a>,
) -> Procs<'a> {
    specialize_all_help(env, &mut procs, layout_cache);

    // When calling from_can, pending_specializations should be unavailable.
    // This must be a single pass, and we must not add any more entries to it!
    let opt_pending_specializations = std::mem::replace(&mut procs.pending_specializations, None);

    for (name, by_layout) in opt_pending_specializations.into_iter().flatten() {
        for (layout, pending) in by_layout.into_iter() {
            // If we've already seen this (Symbol, Layout) combination before,
            // don't try to specialize it again. If we do, we'll loop forever!
            //
            // NOTE: this #[allow(clippy::map_entry)] here is for correctness!
            // Changing it to use .entry() would necessarily make it incorrect.
            #[allow(clippy::map_entry)]
            if !procs.specialized.contains_key(&(name, layout)) {
                // TODO should pending_procs hold a Rc<Proc>?
                let partial_proc = match procs.partial_procs.get(&name) {
                    Some(v) => v.clone(),
                    None => {
                        // TODO this assumes the specialization is done by another module
                        // make sure this does not become a problem down the road!
                        continue;
                    }
                };

                // Mark this proc as in-progress, so if we're dealing with
                // mutually recursive functions, we don't loop forever.
                // (We had a bug around this before this system existed!)
                let outside_layout = layout;
                procs.specialized.insert((name, outside_layout), InProgress);
                match specialize(
                    env,
                    &mut procs,
                    name,
                    layout_cache,
                    pending.clone(),
                    partial_proc,
                ) {
                    Ok((proc, layout)) => {
                        debug_assert_eq!(outside_layout, layout, " in {:?}", name);

                        if let Layout::Closure(args, closure, ret) = layout {
                            procs.specialized.remove(&(name, outside_layout));
                            let layout = ClosureLayout::extend_function_layout(
                                env.arena, args, closure, ret,
                            );
                            procs.specialized.insert((name, layout), Done(proc));
                        } else {
                            procs.specialized.insert((name, layout), Done(proc));
                        }
                    }
                    Err(SpecializeFailure {
                        attempted_layout, ..
                    }) => {
                        let proc = generate_runtime_error_function(env, name, attempted_layout);

                        procs
                            .specialized
                            .insert((name, attempted_layout), Done(proc));
                    }
                }
            }
        }
    }

    procs
}

and relevant types

pub struct Env<'a, 'i> {
    pub arena: &'a Bump,
    ...
}

#[derive(Clone, Debug)]
pub struct Procs<'a> {
    pub partial_procs: BumpMap<'a, Symbol, PartialProc<'a>>,
    pub imported_module_thunks: BumpSet<'a, Symbol>,
    pub module_thunks: BumpSet<'a, Symbol>,
    pub pending_specializations:
        Option<BumpMap<'a, Symbol, HashMap<Layout<'a>, PendingSpecialization<'a>>>>,
    pub specialized: BumpMap<'a, (Symbol, Layout<'a>), InProgressProc<'a>>,
    pub runtime_errors: BumpMap<'a, Symbol, &'a str>,
    pub call_by_pointer_wrappers: BumpMap<'a, Symbol, Symbol>,
    pub externals_others_need: ExternalSpecializations<'a>,
    pub externals_we_need: BumpMap<'a, ModuleId, ExternalSpecializations<'a>>,
}

pub struct PendingSpecialization<'a> {
    host_exposed_aliases: BumpMap<'a, Symbol, SolvedType>,
}

The panic occurs on the final line of specialize_all where some BumpMap (and I'm not even sure where it comes from) gets dropped. Note that the env outlives the function body, and contains the arena which therefore also outlives the function body. I'm not doing anything fishy (e.g. unsafe) with the env or the arena.

A potential lead is that pending_specializations is a Option<BumpMap>, that itself contains a "normal" HashMap that containsBumpMap`s. Seems to me that is the most complex thing going on here.

I also did some experiments with Bump::with_capacity to see if maybe resizing was the problem, but it is not. Bump::chunk_capacity at the start was always higher than Bump::allocated_bytes at the end. The used space was at most 2284456 bytes, and I gave the bump arena 5Mb of capacity.

folkertdev commented 3 years ago

here is a different error, in a totally different place

thread '<unnamed>' panicked at 'Went past end of probe sequence', /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:176:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:493:5
   1: core::panicking::panic_fmt
             at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/core/src/panicking.rs:92:14
   2: core::panicking::panic
             at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/core/src/panicking.rs:50:5
   3: hashbrown::raw::ProbeSeq::move_next
             at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:176:9
   4: hashbrown::raw::RawTableInner<A>::find_insert_slot
             at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:1239:13
   5: hashbrown::raw::RawTable<T,A>::insert
             at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:828:25
   6: hashbrown::raw::RawTable<T,A>::insert_entry
             at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:865:18
   7: hashbrown::map::VacantEntry<K,V,S,A>::insert
             at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/map.rs:3337:21
   8: hashbrown::map::Entry<K,V,S,A>::or_insert_with
             at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/map.rs:2782:37
   9: roc_mono::ir::add_pending
             at ./compiler/mono/src/ir.rs:693:23
  10: roc_mono::ir::call_by_name
             at ./compiler/mono/src/ir.rs:6128:29
  11: roc_mono::ir::with_hole
             at ./compiler/mono/src/ir.rs:3929:21
  12: roc_mono::ir::assign_to_symbol
             at ./compiler/mono/src/ir.rs:5749:23
fn add_pending<'a>(
    pending_specializations: &mut BumpMap<
        'a,
        Symbol,
        MutMap<Layout<'a>, PendingSpecialization<'a>>,
    >,
    symbol: Symbol,
    layout: Layout<'a>,
    pending: PendingSpecialization<'a>,
) {
   // panics on the line below
    let all_pending = pending_specializations
        .entry(symbol)
        .or_insert_with(|| HashMap::with_capacity_and_hasher(1, default_hasher()));

    all_pending.insert(layout, pending);
}

and another one

thread '<unnamed>' panicked at 'assertion failed: index < self.buckets()', /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:1269:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:493:5
   1: core::panicking::panic_fmt
             at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/core/src/panicking.rs:92:14
   2: core::panicking::panic
             at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/core/src/panicking.rs:50:5
   3: hashbrown::raw::RawTableInner<A>::bucket
             at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:1269:9
   4: hashbrown::raw::RawTable<T,A>::resize
             at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:801:17
   5: hashbrown::raw::RawTable<T,A>::reserve_rehash
             at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:693:13
   6: hashbrown::raw::RawTable<T,A>::reserve
             at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:646:16
   7: hashbrown::raw::RawTable<T,A>::insert
             at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:827:17
   8: hashbrown::map::HashMap<K,V,S,A>::insert
             at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/map.rs:1266:13
   9: roc_mono::ir::specialize_all
             at ./compiler/mono/src/ir.rs:1667:17
  10: roc_load::file::make_specializations
             at ./compiler/load/src/file.rs:3808:13
  11: roc_load::file::run_task

which fails on the line

                procs.specialized.insert((name, outside_layout), InProgress);

of the source of specialize_all that I posted above. So this indicates that not only can Drop fail, insertion can fail too. Again, these failures are very random, but happen quite frequently.

Amanieu commented 3 years ago

Do you have anything other than BumpMap using the Bump instance? Can you try isolating each BumpMap to use a separate Bump instance?

Finally, do you have a reproducable example that I can try running on my own system?

folkertdev commented 3 years ago

I've tried replacing the BumpMaps with normal MutMaps, and then converting to BumpMap one-by-one. It seems that making any of those fields a BumpMap re-introduces the problem though, so that is not helpful.

I spent a bunch of time trying to find a minimal example, but could not get a small program to fail. I realize that's not helpful. I'll continue to try finding a small failing example program. Maybe a fuzzing setup can be made?

Amanieu commented 3 years ago

Even a large program is fine, as long as I can get into it with a debugger. I just need to be able to inspect the hashmap state and possible run the program in reverse with rr.

folkertdev commented 3 years ago

sure, you should have just received an invite to the project from @rtfeldman. be sure to check BUILDING_FROM_SOURCE.md because there are some additional things to install.

The problem manifests in commit 2032ef9b52baa482ef31af9b9d13638a6865378b. We've reverted to normal hashmaps for now on trunk. Once everything is installed, run cargo run run examples/benchmarks/TestBase64.roc. At least on my machine and our CI machine this will usually hit a problem in 5 tries, maybe you have to run that command in a loop to get it to panic.

Happy to help if you run into any issues installing/running

Amanieu commented 3 years ago

I'm getting a crash in LLVM:

#0  0x0000555558fe1dc8 in llvm::StringMapImpl::LookupBucketFor(llvm::StringRef) ()
#1  0x0000555558e38cb7 in llvm::MDString::get(llvm::LLVMContext&, llvm::StringRef) ()
#2  0x0000555558da3808 in llvm::DIBuilder::createCompileUnit(unsigned int, llvm::DIFile*, llvm::StringRef, bool, llvm::StringRef, unsigned int, llvm::StringRef, llvm::DICompileUnit::DebugEmissionKind, unsigned long, bool, bool, llvm::DICompileUnit::DebugNameTableKind, bool, llvm::StringRef, llvm::StringRef) ()
#3  0x0000555558daeebb in LLVMDIBuilderCreateCompileUnit ()
#4  0x0000555557ac4ec0 in inkwell::debug_info::DebugInfoBuilder::create_compile_unit (self=0x7fffffff6108, language=inkwell::debug_info::flags::DWARFSourceLanguage::C, file=..., producer=..., is_optimized=false, flags=..., runtime_ver=0, split_name=..., kind=inkwell::debug_info::flags::DWARFEmissionKind::Full, dwo_id=0, split_debug_inlining=false, 
    debug_info_for_profiling=false) at /home/amanieu/.cargo/git/checkouts/inkwell-85610d8ccb0c28f9/9ae45f0/src/debug_info.rs:256
#5  0x0000555557ac4b5e in inkwell::debug_info::DebugInfoBuilder::new (module=0x55555ae4a400, allow_unresolved=true, language=inkwell::debug_info::flags::DWARFSourceLanguage::C, filename=..., directory=..., producer=..., is_optimized=false, flags=..., runtime_ver=0, split_name=..., kind=inkwell::debug_info::flags::DWARFEmissionKind::Full, dwo_id=0, 
    split_debug_inlining=false, debug_info_for_profiling=false) at /home/amanieu/.cargo/git/checkouts/inkwell-85610d8ccb0c28f9/9ae45f0/src/debug_info.rs:200
#6  0x0000555557ab50df in inkwell::module::Module::create_debug_info_builder (self=0x55555ae4a400, allow_unresolved=true, language=inkwell::debug_info::flags::DWARFSourceLanguage::C, filename=..., directory=..., producer=..., is_optimized=false, flags=..., runtime_ver=0, split_name=..., kind=inkwell::debug_info::flags::DWARFEmissionKind::Full, dwo_id=0, 
    split_debug_inlining=false, debug_info_for_profiling=false) at /home/amanieu/.cargo/git/checkouts/inkwell-85610d8ccb0c28f9/9ae45f0/src/module.rs:1377
#7  0x000055555715172e in roc_gen::llvm::build::Env::new_debug_info (module=0x55555ae4a400) at compiler/gen/src/llvm/build.rs:226
#8  0x0000555556e3eb2d in roc_build::program::gen_from_mono_module (arena=0x7fffffffcc48, loaded=..., roc_file_path=0x55555a9f0e60, target=..., app_o_file=0x55555aa82e60, opt_level=roc_gen::llvm::build::OptLevel::Normal, emit_debug_info=false) at compiler/build/src/program.rs:121
#9  0x0000555556cd146e in roc_cli::build::build_file (arena=0x7fffffffcc48, target=0x7fffffffda38, src_dir=..., roc_file_path=..., opt_level=roc_gen::llvm::build::OptLevel::Normal, emit_debug_info=false, link_type=roc_build::link::LinkType::Executable) at cli/src/build.rs:119
#10 0x0000555556cb5fdc in roc_cli::build (target=0x7fffffffda38, matches=0x55555a9e85f8, config=...) at cli/src/lib.rs:155
#11 0x0000555555ac5533 in roc::main () at cli/src/main.rs:28

I have LLVM 10.0.1 installed via the Arch Linux llvm10 package.

Amanieu commented 3 years ago

Could this be because zig is built against LLVM 11?

Amanieu commented 3 years ago

Nevermind I fixed it by deleting target and rebuilding. It works fine for me, I can't get it to crash though.

Amanieu commented 3 years ago
$ target/debug/roc run examples/benchmarks/TestBase64.roc
encoded: SGVsbG8gV29ybGQ=
decoded: Hello World
runtime: 0.058ms
Amanieu commented 3 years ago

Ah nevermind, I wasn't using the right commit. Trying again...

folkertdev commented 3 years ago

any luck?

Amanieu commented 3 years ago

I've managed to reproduce the issue. It seems that the HashMap itself has been overwritten with random garbage:

(gdb) p *self
$5 = hashbrown::raw::RawTable<(roc_module::ident::Lowercase, roc_mono::layout::Layout), hashbrown::BumpWrapper> {table: hashbrown::raw::RawTableInner<hashbrown::BumpWrapper> {bucket_mask: 93825056841222, ctrl: core::ptr::non_null::NonNull<u8> {pointer: 0x2}, growth_left: 8, items: 0, alloc: hashbrown::BumpWrapper (0x4)}, marker: core::marker::PhantomData<(roc_module::ident::Lowercase, roc_mono::layout::Layout)>}

This isn't a bug in hashbrown: something else in your program is overwriting random stack memory.

folkertdev commented 3 years ago

that's eh... interesting. As you can see, it's really just a rust application, so this is very unexpected.

So it's just dumb luck that we're observing this issue with the BumpMap?

Amanieu commented 3 years ago

I'm really not sure what is happening: address sanitizer isn't complaining, which is very strange. I'm also having trouble reproducing the error with rr.

Possibly unrelated, but I did occasionally get this crash with rr:

Unrecognized non-builtin function: "Effect_always_1"

Symbol: `fx.Effect.always`
Layout: FunctionPointer([Union(NonRecursive([[Builtin(Int64), Union(NonRecursive([]))], [Builtin(Int64), Struct([])]]))], Closure([Struct([])], ClosureLayout { captured: [(Closure(`fx.Effect.effect_map_inner`), [Builtin(Int64), Closure([Struct([])], ClosureLayout { captured: [], layout: Struct([Builtin(Str)]) }, Struct([])), FunctionPointer([Struct([])], Union(NonRecursive([[Builtin(Int64), Union(NonRecursive([]))], [Builtin(Int64), Struct([])]])))]), (Closure(`fx.Effect.effect_always_inner`), [Builtin(Int64), Union(NonRecursive([[Builtin(Int64), Union(NonRecursive([]))], [Builtin(Int64), Struct([])]]))])], layout: Union(NonRecursive([[Builtin(Int64), Closure([Struct([])], ClosureLayout { captured: [], layout: Struct([Builtin(Str)]) }, Struct([])), FunctionPointer([Struct([])], Union(NonRecursive([[Builtin(Int64), Union(NonRecursive([]))], [Builtin(Int64), Struct([])]])))], [Builtin(Int64), Union(NonRecursive([[Builtin(Int64), Union(NonRecursive([]))], [Builtin(Int64), Struct([])]]))]])) }, Union(NonRecursive([[Builtin(Int64), Union(NonRecursive([]))], [Builtin(Int64), Struct([])]]))))

Is the function defined? If so, maybe there is a problem with the layout
thread 'main' panicked at 'Unrecognized non-builtin function: "Effect_always_1" (symbol: `fx.Effect.always`)', compiler/gen/src/llvm/build.rs:3457:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
folkertdev commented 3 years ago

yes I think this is unrelated (race condition that has proved tricky to track down). Thanks for the investigation, but yea not sure what to take away from this. Do you have any further debugging ideas?

Amanieu commented 3 years ago

I played around with the code a bit more: if I make is_last_allocation in bumpalo always return false then the crash no longer happens. This function is used by bumpalo's Vec and String types to optimize resize, perhaps it is buggy?

folkertdev commented 3 years ago

it seems plausible to me that hashbrown makes allocations, but bumpalo is somehow not aware and then resizes into space that it thinks is free, but is actually allocated by hashbrown.

I'm guessing hashbrown is the first external data structure that uses bumpalo as an allocator? Anyhow seems worth filing an issue. This also suggests a way to make the failure happen in a smaller program.