leanprover / lean4

Lean 4 programming language and theorem prover
https://lean-lang.org
Apache License 2.0
4.64k stars 417 forks source link

Segmentation Fault with mutual, HashMap, and string interpolation #5188

Open ericluap opened 2 months ago

ericluap commented 2 months ago

Prerequisites

Please put an X between the brackets as you perform the following steps:

Description

The code has a segmentation fault when run.

Context

Here is a discussion on Zulip.

Steps to Reproduce

  1. lake new test
  2. cd test
  3. Modify Main.lean to have the following contents
  4. 
    import Lean.Data.HashMap

mutual inductive Foo | Foo (bar : Bar)

inductive Bar | Foo (foo : Foo) | Baz (baz : String) end

mutual def eToString : Foo → String | Foo.Foo c => cToString c

def cToString : Bar → String | Bar.Foo e => eToString e | Bar.Baz c => c

end instance : ToString Foo := ⟨eToString⟩ instance : ToString Bar := ⟨cToString⟩

def update ( : Nat × Nat) ( : Array String) (w : Lean.HashMap (Nat × Nat) (Array String)) := w.insert (0,0) #[]

def test (x : String) (y : Foo) (z : Lean.HashMap (Nat × Nat) (Array String)): Option (String × (Option String) × (Option String) × String × (Lean.HashMap (Nat × Nat) (Array String))) := do pure $ ( ((toString (← (Array.filter (fun _ => true) #[y])[0]?)).split (· = '0')).getD 0 "", none, "", "", update (x.length, x.length) #[s!"{x}", s!"{x}"] z )

def main : IO Unit := pure ()


4. `lake build`
5. `./.lake/build/bin/test`

**Expected behavior:** The code should compile and run.

**Actual behavior:** It produces a segmentation fault.

### Versions

"4.10.0" and "4.12.0-nightly-2024-08-27"
macOS Monterey Version 12.4

### Impact

Add :+1: to [issues you consider important](https://github.com/leanprover/lean4/issues?q=is%3Aissue+is%3Aopen+sort%3Areactions-%2B1-desc). If others are impacted by this issue, please ask them to add :+1: to it.
kim-em commented 2 months ago

I can reproduce. lldb doesn't give much clue:

% lldb ./.lake/build/bin/test 
(lldb) target create "./.lake/build/bin/test"
Current executable set to '/Users/kim/scratch/test/.lake/build/bin/test' (arm64).
(lldb) run
Process 65953 launched: '/Users/kim/scratch/test/.lake/build/bin/test' (arm64)
Process 65953 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x2000000000000)
    frame #0: 0x0000000100003238 test`initialize_Main + 544
test`initialize_Main:
->  0x100003238 <+544>: ldr    w8, [x19]
    0x10000323c <+548>: cmp    w8, #0x1
    0x100003240 <+552>: b.lt   0x10000329c               ; <+644>
    0x100003244 <+556>: add    w8, w8, #0x1
Target 0: (test) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x2000000000000)
  * frame #0: 0x0000000100003238 test`initialize_Main + 544
    frame #1: 0x00000001000032cc test`main + 32
    frame #2: 0x000000019e3020e0 dyld`start + 2360
(lldb) 
hargoniX commented 2 months ago

Compiling this with -g as leanc args shows that we crash in this function:

static lean_object* _init_l_test___rarg___closed__6() {
_start:
{
lean_object* x_1; lean_object* x_2; lean_object* x_3; 
x_1 = l_update___rarg___closed__2;
x_2 = lean_unsigned_to_nat(0u);
x_3 = lean_array_fget(x_1, x_2);
return x_3;
}

At the fget call. This is because l_update___rarg___closed__2 is initialized with an Array of size 0 as evident by the code:

static lean_object* _init_l_update___rarg___closed__2() {
_start:
{
lean_object* x_1; lean_object* x_2; 
x_1 = lean_unsigned_to_nat(0u);
x_2 = lean_mk_empty_array_with_capacity(x_1);
return x_2;
}

And the debugger:


(gdb) list
828 _start:
829 {
830 lean_object* x_1; lean_object* x_2; lean_object* x_3;
831 x_1 = l_update___rarg___closed__2;
832 x_2 = lean_unsigned_to_nat(0u);
833 x_3 = lean_array_fget(x_1, x_2);
834 return x_3;
835 }
836 }
837 LEAN_EXPORT lean_object* l_test___rarg(lean_object* x_1, lean_object* x_2) {
(gdb) p ((lean_array_object*)x_1)->m_size
$6 = 0
(gdb) p ((lean_array_object*)x_1)->m_capacity
$7 = 0
hargoniX commented 2 months ago

Looking into the IR the test function contains the following IR prior to closed subterm extraction:

  let _x_4 := Array.mkEmpty _neutral 0;
  let _x_5 := Nat.decLt 0 _x_3;
  Bool.casesOn _x_5
    (let _x_6 := Array.size _neutral _x_4;
    let _x_7 := Nat.decLt 0 _x_6;
    Bool.casesOn _x_7 (none _neutral)
      (let _x_8 := Array.get _neutral _x_4 0;
      @j_1._join _x_8))

What is probably happening here, is that the compiler pulled out Array.get x_4 0 because x_4 is a constant itself. This then gets evaluated in the initialize phase and subsequently crashes. I can offer the following temporary fix:

import Lean.Data.HashMap

mutual
inductive Foo
| Foo (bar : Bar)

inductive Bar
| Foo (foo : Foo)
| Baz (baz : String)
end

mutual
def eToString : Foo → String
| Foo.Foo c => cToString c

def cToString : Bar → String
| Bar.Foo e => eToString e
| Bar.Baz c => c

end
instance : ToString Foo := ⟨eToString⟩
instance : ToString Bar := ⟨cToString⟩

def update (_ : Nat × Nat) (_ : Array String) (w : Lean.HashMap (Nat × Nat) (Array String)) :=
  w.insert (0,0) #[]

set_option compiler.extract_closed false in
def test (x : String) (y : Foo) (z : Lean.HashMap (Nat × Nat) (Array String)): Option (String × (Option String) × (Option String) × String × (Lean.HashMap (Nat × Nat) (Array String))) := do
  pure $ (
    ((toString (← (Array.filter (fun _ => true) #[y])[0]?)).split (· = '0')).getD 0 "",
    none,
    "",
    "",
    update (x.length, x.length) #[s!"{x}", s!"{x}"] z
  )

def main : IO Unit := pure ()