Open enitrat opened 1 month ago
I tried to build this on linux x86_64 but i get a build error
error[E0412]: cannot find type `integer_t` in crate `libc`
--> crates/ef-testing/src/models/result.rs:120:36
|
120 | / mem::size_of::<libc::integer_t>() as libc::c_uint;
| ^^^^^^^^^ help: a type alias with a similar name exists: `intptr_t`
|
::: /home/edgar/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libc-0.2.159/src/unix/mod.rs:21:1
|
21 | pub type intptr_t = isize;
| ----------------- similarly named type alias `intptr_t` defined here
warning: unused imports: `ACCOUNT_CONTRACT_CLASS`, `KAKAROT_CLASS`, and `UNINITIALIZED_ACCOUNT_CLASS`
--> crates/ef-testing/src/evm_sequencer/sequencer/mod.rs:12:9
|
12 | ACCOUNT_CONTRACT_CLASS, ACCOUNT_CONTRACT_CLASS_HASH, BLOCK_GAS_LIMIT,
| ^^^^^^^^^^^^^^^^^^^^^^
13 | ETH_FEE_TOKEN_ADDRESS, FEE_TOKEN_CLASS, FEE_TOKEN_CLASS_HASH, KAKAROT_ADDRESS,
14 | KAKAROT_CLASS, KAKAROT_CLASS_HASH, KAKAROT_OWNER_ADDRESS, OPENZEPPELIN_ACCOUNT_CLASS,
| ^^^^^^^^^^^^^
15 | OPENZEPPELIN_ACCOUNT_CLASS_HASH, RELAYER_ADDRESS, RELAYER_BALANCE, RELAYER_VERIFYING_KEY,
16 | STRK_FEE_TOKEN_ADDRESS, UNINITIALIZED_ACCOUNT_CLASS, UNINITIALIZED_ACCOUNT_CLASS_HASH,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
warning: unused import: `std::ptr`
--> crates/ef-testing/src/models/result.rs:82:5
|
82 | use std::ptr;
| ^^^^^^^^
For more information about this error, try `rustc --explain E0412`.
warning: `ef-testing` (lib) generated 2 warnings
error: could not compile `ef-testing` (lib) due to 1 previous error; 2 warnings emitted
warning: build failed, waiting for other jobs to finish...
I tried to build this on linux x86_64 but i get a build error
Must be a platform-specific thing:
can you comment line 151 and the definition of foo
line 108? I used this to get memory usage logs after each test to identify families that caused memory leaks.
The make steup-kakarot
step is failing for me. Is this expected?
ef-tests$ make setup-kakarot
rm -rf build/common
mkdir -p build/common
rm -rf build/v0
mkdir -p build/v0
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 8651 100 8651 0 0 25596 0 --:--:-- --:--:-- --:--:-- 25670
unzip -o kakarot-build.zip -d build/v0
Archive: kakarot-build.zip
inflating: build/v0/contracts_MockContractUpgradeableV0.compiled_contract_class.json
inflating: build/v0/contracts_UninitializedAccount.contract_class.json
inflating: build/v0/contracts_AccountContract.contract_class.json
inflating: build/v0/contracts_UninitializedAccount.compiled_contract_class.json
inflating: build/v0/contracts.starknet_artifacts.json
inflating: build/v0/contracts_KakarotCore.compiled_contract_class.json
inflating: build/v0/contracts_MockContractUpgradeableV1.contract_class.json
inflating: build/v0/contracts_ERC20.compiled_contract_class.json
inflating: build/v0/contracts_ERC20.contract_class.json
inflating: build/v0/contracts_Cairo1HelpersFixture.compiled_contract_class.json
inflating: build/v0/contracts_AccountContract.compiled_contract_class.json
inflating: build/v0/contracts_KakarotCore.contract_class.json
inflating: build/v0/contracts_MockContractUpgradeableV0.contract_class.json
inflating: build/v0/contracts_Cairo1Helpers.contract_class.json
inflating: build/v0/contracts.sierra.json
inflating: build/v0/contracts_Cairo1Helpers.compiled_contract_class.json
inflating: build/v0/contracts_Cairo1HelpersFixture.contract_class.json
inflating: build/v0/contracts_MockContractUpgradeableV1.compiled_contract_class.json
mv build/v0/build/* build/v0
mv: rename build/v0/build/* to build/v0/*: No such file or directory
make: *** [setup-kakarot-v0] Error 1
This seems to cause the precompiles to not be available when I try to test it:
Finished `test` profile [unoptimized + debuginfo] target(s) in 18.76s
Running tests/stStaticCall.rs (target/debug/deps/stStaticCall-022d7fbb57a5b4fd)
running 1 test
thread 'stStaticCall::test_static_ABAcalls0_d1g0v0_Cancun' panicked at crates/ef-testing/src/evm_sequencer/constants.rs:66:120:
Failed to load precompiles contract class: No such file or directory (os error 2)
Location:
crates/ef-testing/src/evm_sequencer/constants.rs:22:32
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test stStaticCall::test_static_ABAcalls0_d1g0v0_Cancun ... FAILED
failures:
failures:
stStaticCall::test_static_ABAcalls0_d1g0v0_Cancun
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 477 filtered out; finished in 0.10s
error: test failed, to rerun pass `-p ef-testing --test stStaticCall`
Something broke with our new release yesterday. Will check and update
Hi @azteca1998 @edg-l you should be able to repro again. I updated native to ab478323d6aee5e0424712bbde98de443b8cc72f
I also hid the ram usage prints under a macos target, so that you should be able to run it on other targets fine.
Hello, sorry for the delay.
We've been working on various stuff in native which made it so that ef-tests
no longer compiles using the main branch of the sequencer and native's main branches. Could you please update it? Some of the recent changes may be involved in the bug.
Hi @azteca1998 i updated the ef-test repo to commit 28caad05e4a13f935acff83ea144df0b1bc24e2e with native 5e60089288c461eca98bf3dbe03cc882778ff677. The issue still occurs
Ok, I think I found the problem. It's the same as #816.
If I run the test with lldb and compare stack pointers I get the following:
Stack pointer (first frame): 0x00007ffff7a1da80
Stack pointer (crash frame): 0x00007ffff780bce0
Crash pointer : 0x00007ffff780d138
The first number is the value of rsp
(stack pointer) when the program started. The second is its value just before the crash. The last number is already outside the allocated stack size, therefore crashing the process.
If I run the test with a larger stack (1GiB in my test, probably works with way less stack) the test seems to run without problems.
You can make the tests run with a larger stack by using this environment variable: RUST_MIN_STACK=1073741824
.
Ok, great, at the time I opeoned the issue increasing the stack size did not have any effect, which no longer seems the be the case. However, the current run is taking more than 2hours, up from 50mins before.
https://github.com/kkrt-labs/kakarot-ssj/actions/runs/11947735661/job/33304216235?pr=1011
I will need to check why
Could it be that the test that crashed was run at about the 50 minutes mark, therefore stopping all other tests, while now all tests have to be run, taking more than 2 hours?
No, previously all 20k tests were passing in ~50mins (20mins compilation, 30mins runtime). I added some more debug info to pinpoint problem is exactly, will update you when I have results
2024-11-21T13:08:33.3640356Z Test 'modexp_modsize0_returndatasize_d4g0v0_Cancun' completed in 2942.0 seconds 2024-11-21T13:08:33.3641902Z test stReturnDataTest::stReturnDataTest::test_modexp_modsize0_returndatasize_d4g0v0_Cancun ... ok
Some tests take 45 mins to 1.5hrs to complete a single, easy test. Do you have any idea how to debug why this is happening? I'm not sure what to look for.
test_modexp_modsize0_returndatasize_d4g0v0_Cancun
That specific instance has a lot of felt252 dict operations because it's very unoptimized. I'll add some prints directly in the cairo code to see what happens in realtime, but that's a first thing we could be looking at.
Not sure if that's the case, especially since it didn't happen earlier, but dictionaries are implemented in Rust within the runtime. When running the JIT, the runtime used is the dependency of cairo-native, which for debug builds is built using the debug profile. Could you try running the test in release mode?
When running in AOT, the runtime used is the static library. If the tests are running in AOT, could you check if the linked library (the path in the env var) points to a release build?
JIT is new right? I think i'm running in AOT because the integration in Native uses AotExecutor. What do you think I should be using for this specific usecase ?
I built the runtime running make runtime
which builds in release mode, so it's a release build I believe.
JIT is the original executor, AOT came later, but I think you're probably using AOT. If the runtime uses the release build then I'm not sure what could be causing the performance issues.
What do you think I should be using for this specific usecase ?
Both are fine. Internally, their interface is the same, so whatever works on one should work on the other. The sequencer is using AOT.
...
Start: Executing Modexp Precompile
Current test duration: 803.0s
...
WARNING: Test 'modexp_modsize0_returndatasize_d4g0v0_Cancun' has been running for over 120 seconds
Current duration: 1516.0s
End: MPNAT::from_big_endian operation
Start: modpow operation
End: modpow operation
End: Executing Modexp Precompile
There seems to be an issue with the operations performed in MPNAT::from_big_endian
, executed twice, taking ~ 10 mins so 5mins each.
Here's a link to the function https://github.com/kkrt-labs/kakarot-ssj/blob/c2e4ccbfcd8d6b8d79668f34daab97080d6769f9/crates/utils/src/crypto/modexp/mpnat.cairo#L38-L106
It involves a lot of dict operations. Do you want me to add more granularity?
Can you provide the bytes
argument? This way we'll be able to replicate it exactly.
Can you provide the bytes argument? This way we'll be able to replicate it exactly.
Hm so actually in that specific case I checked and we're sending 999188
bytes to the function. It's quite a lot, I can't really send it over to you.
edit: I opened old logs of using native to find out that this test actually used to pass when using native ae17dd370a7bbf6affeefb9fa6954965e8b52239 🤔 so perhaps the big number of bytes is not that much of an issue and it should pass regardless?
see https://github.com/kkrt-labs/kakarot-ssj/actions/runs/11810716802/job/32903209209 from PR https://github.com/kkrt-labs/kakarot-ssj/pull/1021 (where the version of native is ae17dd370a7bbf6affeefb9fa6954965e8b52239): download the logs, open 2_ef-tests _ ef-tests.txt
, search for modexp_modsize0_returndatasize_d4g0v0_Cancun
and you will see it passed
Just re-ran on my computer the stack using cairo-native ae17dd370a7bbf6affeefb9fa6954965e8b52239
WARNING: Test 'modexp_modsize0_returndatasize_d4g0v0_Cancun' has been running for over 120 seconds
Current duration: 834.0s
... some additional operations ..
INFO ef_testing::models::result: stReturnDataTest::modexp_modsize0_returndatasize_d4g0v0_Cancun passed: TransactionResources { starknet_resources: StarknetResources { calldata_length: 155, state_changes_for_fee: StateChangesCount { n_storage_updates: 3, n_class_hash_updates: 0, n_compiled_class_hash_updates: 0, n_modified_contracts: 3 }, message_cost_info: MessageL1CostInfo { l2_to_l1_payload_lengths: [], message_segment_length: 0 }, l1_handler_payload_size: None, n_events: 1, signature_length: 2, code_size: 0, total_event_keys: 1, total_event_data_size: 3 }, vm_resources: ExecutionResources { n_steps: 7527, n_memory_holes: 53, builtin_instance_counter: {ecdsa: 1, range_check: 117, pedersen: 173} }, n_reverted_steps: 0 }
Current memory usage: Ok(4963549184) bytes
Test 'modexp_modsize0_returndatasize_d4g0v0_Cancun' completed in 865.0 seconds
test stReturnDataTest::test_modexp_modsize0_returndatasize_d4g0v0_Cancun ... ok
So that's like ~ 10 seconds for the actual test run. Compared to infinite time now 🤔
To summarize:
https://github.com/kkrt-labs/ef-tests on branch feat/cairo-native
, commit hash f752975db32dead8175ac5392d132d3524e9fc0b
, using cairo-native ae17dd370a7bbf6affeefb9fa6954965e8b52239
less than 10sec to run this big test
https://github.com/kkrt-labs/ef-tests on branch dev/bump-native
, commit hash a513ac774a8e6ad97132f406938eec7ee71423ce
, using cairo-native 5e60089288c461eca98bf3dbe03cc882778ff677:
The test takes around 10 mins to run, and other tests also have extremely degraded performance.
Let me know how I could help you more.
Regarding the bytes passed to the function:
bytes.len(): bytes.len(): 999188
bytes slice 128: [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
bytes slice last 128: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
I think it's an array of size 999188
whose first element is 1
and the rest is 0
.
It seems that between the previous Cairo native version and the current one we've added proper memory management (now we deallocate stuff properly). That may be part of the performance issue. Here are the commits I suspect will have more impact in the program's performance:
79a8dcf - Fix clone and drop implementations for enums, structs and snapshots.
0b73f1c - Fix `array_get`'s drop implementation.
cff7a11 - Update to 2.8.4, add release docs, version alpha.3
82c25b3 - Add malloc tracing and fix more memory leaks.
If you're going to test them, please keep in mind it may start crashing again since there have been some bug fixes after those commits. I recommend testing using a test that was known to pass but still has had a performance regression.
Would there be another way to check what's taking so long? testing each one of these commits manually sounds tedious because I have to re-adapt the sequencer impl every time
Yes, we're working (since about an hour ago) in a tool to profile programs a bit better. I'm not sure if it'll be useful at first because we're more interested in the time taken by cairo vs the syscall handler, but it should be adaptable to this use case later on.
Where are you using Cairo Native Kakarot EF-Test suite. Running the 20k ethereum tests using blockifier + cairo native + kakarot zkevm
Version last main
Describe the bug
To Reproduce
Clone https://github.com/kkrt-labs/ef-tests Checkout
dev/bump-native
make
&&make setup-kakarot
Make sureCAIRO_NATIVE_RUNTIME_LIBRARY
is a defined environment variable runcargo test test_static_ABAcalls0_d1g0v0_Cancun --features v1,native -- --nocapture
this should error with(signal: 11, SIGSEGV: invalid memory reference)
The backtrace is from lldb:
lldb path/to/test
platform settings -w /path/to/ef-tests/crates/ef-testing
run
Desktop (please complete the following information):