Open folkertdev opened 3 years ago
By any chance are you dropping the Bump
before the hashmap is dropped? This shouldn't be possible but it seems to be happening in roc_mono::ir::specialize_all
. Can you provide the code for that function?
I can, but I'm very sure the Bump
is not dropped. Here is the source
pub fn specialize_all<'a>(
env: &mut Env<'a, '_>,
mut procs: Procs<'a>,
layout_cache: &mut LayoutCache<'a>,
) -> Procs<'a> {
specialize_all_help(env, &mut procs, layout_cache);
// When calling from_can, pending_specializations should be unavailable.
// This must be a single pass, and we must not add any more entries to it!
let opt_pending_specializations = std::mem::replace(&mut procs.pending_specializations, None);
for (name, by_layout) in opt_pending_specializations.into_iter().flatten() {
for (layout, pending) in by_layout.into_iter() {
// If we've already seen this (Symbol, Layout) combination before,
// don't try to specialize it again. If we do, we'll loop forever!
//
// NOTE: this #[allow(clippy::map_entry)] here is for correctness!
// Changing it to use .entry() would necessarily make it incorrect.
#[allow(clippy::map_entry)]
if !procs.specialized.contains_key(&(name, layout)) {
// TODO should pending_procs hold a Rc<Proc>?
let partial_proc = match procs.partial_procs.get(&name) {
Some(v) => v.clone(),
None => {
// TODO this assumes the specialization is done by another module
// make sure this does not become a problem down the road!
continue;
}
};
// Mark this proc as in-progress, so if we're dealing with
// mutually recursive functions, we don't loop forever.
// (We had a bug around this before this system existed!)
let outside_layout = layout;
procs.specialized.insert((name, outside_layout), InProgress);
match specialize(
env,
&mut procs,
name,
layout_cache,
pending.clone(),
partial_proc,
) {
Ok((proc, layout)) => {
debug_assert_eq!(outside_layout, layout, " in {:?}", name);
if let Layout::Closure(args, closure, ret) = layout {
procs.specialized.remove(&(name, outside_layout));
let layout = ClosureLayout::extend_function_layout(
env.arena, args, closure, ret,
);
procs.specialized.insert((name, layout), Done(proc));
} else {
procs.specialized.insert((name, layout), Done(proc));
}
}
Err(SpecializeFailure {
attempted_layout, ..
}) => {
let proc = generate_runtime_error_function(env, name, attempted_layout);
procs
.specialized
.insert((name, attempted_layout), Done(proc));
}
}
}
}
}
procs
}
and relevant types
pub struct Env<'a, 'i> {
pub arena: &'a Bump,
...
}
#[derive(Clone, Debug)]
pub struct Procs<'a> {
pub partial_procs: BumpMap<'a, Symbol, PartialProc<'a>>,
pub imported_module_thunks: BumpSet<'a, Symbol>,
pub module_thunks: BumpSet<'a, Symbol>,
pub pending_specializations:
Option<BumpMap<'a, Symbol, HashMap<Layout<'a>, PendingSpecialization<'a>>>>,
pub specialized: BumpMap<'a, (Symbol, Layout<'a>), InProgressProc<'a>>,
pub runtime_errors: BumpMap<'a, Symbol, &'a str>,
pub call_by_pointer_wrappers: BumpMap<'a, Symbol, Symbol>,
pub externals_others_need: ExternalSpecializations<'a>,
pub externals_we_need: BumpMap<'a, ModuleId, ExternalSpecializations<'a>>,
}
pub struct PendingSpecialization<'a> {
host_exposed_aliases: BumpMap<'a, Symbol, SolvedType>,
}
The panic occurs on the final line of specialize_all
where some BumpMap
(and I'm not even sure where it comes from) gets dropped. Note that the env
outlives the function body, and contains the arena
which therefore also outlives the function body. I'm not doing anything fishy (e.g. unsafe
) with the env or the arena.
A potential lead is that pending_specializations is a Option<BumpMap>, that itself contains a "normal" HashMap that contains
BumpMap`s. Seems to me that is the most complex thing going on here.
I also did some experiments with Bump::with_capacity
to see if maybe resizing was the problem, but it is not. Bump::chunk_capacity
at the start was always higher than Bump::allocated_bytes
at the end. The used space was at most 2284456
bytes, and I gave the bump arena 5Mb
of capacity.
here is a different error, in a totally different place
thread '<unnamed>' panicked at 'Went past end of probe sequence', /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:176:9
stack backtrace:
0: rust_begin_unwind
at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:493:5
1: core::panicking::panic_fmt
at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/core/src/panicking.rs:92:14
2: core::panicking::panic
at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/core/src/panicking.rs:50:5
3: hashbrown::raw::ProbeSeq::move_next
at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:176:9
4: hashbrown::raw::RawTableInner<A>::find_insert_slot
at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:1239:13
5: hashbrown::raw::RawTable<T,A>::insert
at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:828:25
6: hashbrown::raw::RawTable<T,A>::insert_entry
at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:865:18
7: hashbrown::map::VacantEntry<K,V,S,A>::insert
at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/map.rs:3337:21
8: hashbrown::map::Entry<K,V,S,A>::or_insert_with
at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/map.rs:2782:37
9: roc_mono::ir::add_pending
at ./compiler/mono/src/ir.rs:693:23
10: roc_mono::ir::call_by_name
at ./compiler/mono/src/ir.rs:6128:29
11: roc_mono::ir::with_hole
at ./compiler/mono/src/ir.rs:3929:21
12: roc_mono::ir::assign_to_symbol
at ./compiler/mono/src/ir.rs:5749:23
fn add_pending<'a>(
pending_specializations: &mut BumpMap<
'a,
Symbol,
MutMap<Layout<'a>, PendingSpecialization<'a>>,
>,
symbol: Symbol,
layout: Layout<'a>,
pending: PendingSpecialization<'a>,
) {
// panics on the line below
let all_pending = pending_specializations
.entry(symbol)
.or_insert_with(|| HashMap::with_capacity_and_hasher(1, default_hasher()));
all_pending.insert(layout, pending);
}
and another one
thread '<unnamed>' panicked at 'assertion failed: index < self.buckets()', /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:1269:9
stack backtrace:
0: rust_begin_unwind
at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:493:5
1: core::panicking::panic_fmt
at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/core/src/panicking.rs:92:14
2: core::panicking::panic
at /rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/core/src/panicking.rs:50:5
3: hashbrown::raw::RawTableInner<A>::bucket
at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:1269:9
4: hashbrown::raw::RawTable<T,A>::resize
at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:801:17
5: hashbrown::raw::RawTable<T,A>::reserve_rehash
at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:693:13
6: hashbrown::raw::RawTable<T,A>::reserve
at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:646:16
7: hashbrown::raw::RawTable<T,A>::insert
at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/raw/mod.rs:827:17
8: hashbrown::map::HashMap<K,V,S,A>::insert
at /home/folkertdev/.cargo/registry/src/github.com-1ecc6299db9ec823/hashbrown-0.11.2/src/map.rs:1266:13
9: roc_mono::ir::specialize_all
at ./compiler/mono/src/ir.rs:1667:17
10: roc_load::file::make_specializations
at ./compiler/load/src/file.rs:3808:13
11: roc_load::file::run_task
which fails on the line
procs.specialized.insert((name, outside_layout), InProgress);
of the source of specialize_all
that I posted above. So this indicates that not only can Drop
fail, insertion can fail too. Again, these failures are very random, but happen quite frequently.
Do you have anything other than BumpMap
using the Bump
instance? Can you try isolating each BumpMap
to use a separate Bump
instance?
Finally, do you have a reproducable example that I can try running on my own system?
I've tried replacing the BumpMap
s with normal MutMap
s, and then converting to BumpMap
one-by-one. It seems that making any of those fields a BumpMap
re-introduces the problem though, so that is not helpful.
I spent a bunch of time trying to find a minimal example, but could not get a small program to fail. I realize that's not helpful. I'll continue to try finding a small failing example program. Maybe a fuzzing setup can be made?
Even a large program is fine, as long as I can get into it with a debugger. I just need to be able to inspect the hashmap state and possible run the program in reverse with rr
.
sure, you should have just received an invite to the project from @rtfeldman. be sure to check BUILDING_FROM_SOURCE.md
because there are some additional things to install.
The problem manifests in commit 2032ef9b52baa482ef31af9b9d13638a6865378b
. We've reverted to normal hashmaps for now on trunk
. Once everything is installed, run cargo run run examples/benchmarks/TestBase64.roc
. At least on my machine and our CI machine this will usually hit a problem in 5 tries, maybe you have to run that command in a loop to get it to panic.
Happy to help if you run into any issues installing/running
I'm getting a crash in LLVM:
#0 0x0000555558fe1dc8 in llvm::StringMapImpl::LookupBucketFor(llvm::StringRef) ()
#1 0x0000555558e38cb7 in llvm::MDString::get(llvm::LLVMContext&, llvm::StringRef) ()
#2 0x0000555558da3808 in llvm::DIBuilder::createCompileUnit(unsigned int, llvm::DIFile*, llvm::StringRef, bool, llvm::StringRef, unsigned int, llvm::StringRef, llvm::DICompileUnit::DebugEmissionKind, unsigned long, bool, bool, llvm::DICompileUnit::DebugNameTableKind, bool, llvm::StringRef, llvm::StringRef) ()
#3 0x0000555558daeebb in LLVMDIBuilderCreateCompileUnit ()
#4 0x0000555557ac4ec0 in inkwell::debug_info::DebugInfoBuilder::create_compile_unit (self=0x7fffffff6108, language=inkwell::debug_info::flags::DWARFSourceLanguage::C, file=..., producer=..., is_optimized=false, flags=..., runtime_ver=0, split_name=..., kind=inkwell::debug_info::flags::DWARFEmissionKind::Full, dwo_id=0, split_debug_inlining=false,
debug_info_for_profiling=false) at /home/amanieu/.cargo/git/checkouts/inkwell-85610d8ccb0c28f9/9ae45f0/src/debug_info.rs:256
#5 0x0000555557ac4b5e in inkwell::debug_info::DebugInfoBuilder::new (module=0x55555ae4a400, allow_unresolved=true, language=inkwell::debug_info::flags::DWARFSourceLanguage::C, filename=..., directory=..., producer=..., is_optimized=false, flags=..., runtime_ver=0, split_name=..., kind=inkwell::debug_info::flags::DWARFEmissionKind::Full, dwo_id=0,
split_debug_inlining=false, debug_info_for_profiling=false) at /home/amanieu/.cargo/git/checkouts/inkwell-85610d8ccb0c28f9/9ae45f0/src/debug_info.rs:200
#6 0x0000555557ab50df in inkwell::module::Module::create_debug_info_builder (self=0x55555ae4a400, allow_unresolved=true, language=inkwell::debug_info::flags::DWARFSourceLanguage::C, filename=..., directory=..., producer=..., is_optimized=false, flags=..., runtime_ver=0, split_name=..., kind=inkwell::debug_info::flags::DWARFEmissionKind::Full, dwo_id=0,
split_debug_inlining=false, debug_info_for_profiling=false) at /home/amanieu/.cargo/git/checkouts/inkwell-85610d8ccb0c28f9/9ae45f0/src/module.rs:1377
#7 0x000055555715172e in roc_gen::llvm::build::Env::new_debug_info (module=0x55555ae4a400) at compiler/gen/src/llvm/build.rs:226
#8 0x0000555556e3eb2d in roc_build::program::gen_from_mono_module (arena=0x7fffffffcc48, loaded=..., roc_file_path=0x55555a9f0e60, target=..., app_o_file=0x55555aa82e60, opt_level=roc_gen::llvm::build::OptLevel::Normal, emit_debug_info=false) at compiler/build/src/program.rs:121
#9 0x0000555556cd146e in roc_cli::build::build_file (arena=0x7fffffffcc48, target=0x7fffffffda38, src_dir=..., roc_file_path=..., opt_level=roc_gen::llvm::build::OptLevel::Normal, emit_debug_info=false, link_type=roc_build::link::LinkType::Executable) at cli/src/build.rs:119
#10 0x0000555556cb5fdc in roc_cli::build (target=0x7fffffffda38, matches=0x55555a9e85f8, config=...) at cli/src/lib.rs:155
#11 0x0000555555ac5533 in roc::main () at cli/src/main.rs:28
I have LLVM 10.0.1 installed via the Arch Linux llvm10
package.
Could this be because zig is built against LLVM 11?
Nevermind I fixed it by deleting target
and rebuilding. It works fine for me, I can't get it to crash though.
$ target/debug/roc run examples/benchmarks/TestBase64.roc
encoded: SGVsbG8gV29ybGQ=
decoded: Hello World
runtime: 0.058ms
Ah nevermind, I wasn't using the right commit. Trying again...
any luck?
I've managed to reproduce the issue. It seems that the HashMap
itself has been overwritten with random garbage:
(gdb) p *self
$5 = hashbrown::raw::RawTable<(roc_module::ident::Lowercase, roc_mono::layout::Layout), hashbrown::BumpWrapper> {table: hashbrown::raw::RawTableInner<hashbrown::BumpWrapper> {bucket_mask: 93825056841222, ctrl: core::ptr::non_null::NonNull<u8> {pointer: 0x2}, growth_left: 8, items: 0, alloc: hashbrown::BumpWrapper (0x4)}, marker: core::marker::PhantomData<(roc_module::ident::Lowercase, roc_mono::layout::Layout)>}
This isn't a bug in hashbrown
: something else in your program is overwriting random stack memory.
that's eh... interesting. As you can see, it's really just a rust application, so this is very unexpected.
So it's just dumb luck that we're observing this issue with the BumpMap
?
I'm really not sure what is happening: address sanitizer isn't complaining, which is very strange. I'm also having trouble reproducing the error with rr
.
Possibly unrelated, but I did occasionally get this crash with rr
:
Unrecognized non-builtin function: "Effect_always_1"
Symbol: `fx.Effect.always`
Layout: FunctionPointer([Union(NonRecursive([[Builtin(Int64), Union(NonRecursive([]))], [Builtin(Int64), Struct([])]]))], Closure([Struct([])], ClosureLayout { captured: [(Closure(`fx.Effect.effect_map_inner`), [Builtin(Int64), Closure([Struct([])], ClosureLayout { captured: [], layout: Struct([Builtin(Str)]) }, Struct([])), FunctionPointer([Struct([])], Union(NonRecursive([[Builtin(Int64), Union(NonRecursive([]))], [Builtin(Int64), Struct([])]])))]), (Closure(`fx.Effect.effect_always_inner`), [Builtin(Int64), Union(NonRecursive([[Builtin(Int64), Union(NonRecursive([]))], [Builtin(Int64), Struct([])]]))])], layout: Union(NonRecursive([[Builtin(Int64), Closure([Struct([])], ClosureLayout { captured: [], layout: Struct([Builtin(Str)]) }, Struct([])), FunctionPointer([Struct([])], Union(NonRecursive([[Builtin(Int64), Union(NonRecursive([]))], [Builtin(Int64), Struct([])]])))], [Builtin(Int64), Union(NonRecursive([[Builtin(Int64), Union(NonRecursive([]))], [Builtin(Int64), Struct([])]]))]])) }, Union(NonRecursive([[Builtin(Int64), Union(NonRecursive([]))], [Builtin(Int64), Struct([])]]))))
Is the function defined? If so, maybe there is a problem with the layout
thread 'main' panicked at 'Unrecognized non-builtin function: "Effect_always_1" (symbol: `fx.Effect.always`)', compiler/gen/src/llvm/build.rs:3457:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
yes I think this is unrelated (race condition that has proved tricky to track down). Thanks for the investigation, but yea not sure what to take away from this. Do you have any further debugging ideas?
I played around with the code a bit more: if I make is_last_allocation
in bumpalo always return false
then the crash no longer happens. This function is used by bumpalo's Vec
and String
types to optimize resize, perhaps it is buggy?
it seems plausible to me that hashbrown
makes allocations, but bumpalo is somehow not aware and then resizes into space that it thinks is free, but is actually allocated by hashbrown.
I'm guessing hashbrown is the first external data structure that uses bumpalo as an allocator? Anyhow seems worth filing an issue. This also suggests a way to make the failure happen in a smaller program.
First, I'm not sure if this should (also) go into the bumpalo issue tracker. Let me know.
I'm observing various issues with this type
and these are created like so:
Dependencies are
This mostly works, but for some examples panics in the
Drop
instance ofHashMap
, and sometimes just outright segfaults. Here is a backtrace.It's hard to track down the cause exactly, because it does not happen consistently. I think the problem is in the (use of) the allocator, which means that only when the allocator is moved (enough times) does this problem occur. For instance what might happen is that on some runs a
realloc
can extend the current bump arena, and on other runs it cannot.I'm hoping the backtrace is enough information to figure out some possible causes. I'm happy to then explore those.