lambdaclass / cairo_native

A compiler to convert Cairo's intermediate representation "Sierra" code to MLIR.
https://lambdaclass.github.io/cairo_native/cairo_native
Apache License 2.0
121 stars 43 forks source link

Trace Dump: Segmentation Fault #840

Closed JulianGCalderon closed 2 weeks ago

JulianGCalderon commented 1 month ago

When running transaction 0xde5066db6a374038d33b7c89cce4d3e09e6f645bd5b95ec0986713f51e0c69 with trace-dump feature, execution fails with segmentation fault.

This only happens when using trace dump. stacktrace:

(gdb) where
#0  core::ptr::non_null::NonNull<core::ptr::non_null::NonNull<()>>::read<core::ptr::non_null::NonNull<()>> (self=...)
    at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ptr/non_null.rs:922
#1  0x00007ffff25cf336 in cairo_native_runtime::trace_dump::read_value_ptr::{closure#0} ()
    at runtime/src/lib.rs:969
...
...
#14 0x00007ffff1fc870c in cairo_native_runtime::trace_dump::read_value_ptr (
    registry=0x555562299d68, type_id=0x7ffffffa1a30, value_ptr=..., 
    get_layout=0x55555657b8c0 <core::ops::function::FnOnce::call_once<blockifier::execution::native::entry_point_execution::execute_entry_point_call::{closure_env#0}, (&cairo_lang_sierra::ex--Type <RET> for more, q to quit, c to continue without paging--
tensions::core::CoreTypeConcrete, &cairo_lang_sierra::program_registry::ProgramRegistry<cairo_lang_sierra::extensions::core::CoreType, cairo_lang_sierra::extensions::core::CoreLibfunc>)>>)
    at runtime/src/lib.rs:960
#15 0x00007ffff1fc660e in cairo_native_runtime::trace_dump::cairo_native__trace_dump__state (
    trace_id=2, var_id=110, type_id=198, value_ptr=...) at runtime/src/lib.rs:690

The panic occurs when dumping felt dict entries.

JulianGCalderon commented 1 month ago

The segmentation fault happens when dereferencing an invalid pointer inside a DictFelt. I've executed it a few times and the pointer dereference has the value 0x1 or 0x8000000000000001 which is an odd value for a pointer.

The fact that the values changes between 0x1 and 0x8000000000000001 is odd, I don't know what causes this.

My guess is that it is stored in the dictionary as a literal and not as a pointer, which is why dereferencing it fails. We should check libfuncs dict_get and dict_finish as those are probably related with this.

FrancoGiachetta commented 1 month ago

I've run the same transaction and got this message: thread 'main' panicked at /home/dev/.cargo/registry/src/index.crates.io-6f17d22bba15001f/smol_str-0.2.2/src/lib.rs:551:31: range end index 240 out of range for slice of length 23

Couldn't find an explanation to it. Running it with the backtrace points out it might be related to serde_json when trying to parse the trace and failing to do so.

edg-l commented 4 weeks ago

is this fixed?

FrancoGiachetta commented 4 weeks ago

It is not. It was left unsolved as it was no longer necessary since the main issue was already solved.

FrancoGiachetta commented 3 weeks ago

Update

The value ptr which is casted before producing the dump differs from the one produced by the FeltDict constructor. This make the cast be completely wrong:

The seg fault happens due to trying the read the entries of this inner field. This means that the value_ptr gets corrupted during the program execution. Screenshot 2024-10-29 at 2 24 04 PM

The CONSTRUCTOR PTR is the one created in function cairo_native__dict_new, also the FREE_FN PTR is from there. The VALUE PTR and CAST VALUE are printed inside the match of FeltDict in function read_value_ptr

FrancoGiachetta commented 2 weeks ago

Solved in this commit