solana-labs / solana

Web-Scale Blockchain for fast, secure, scalable, decentralized apps and marketplaces.
https://solanalabs.com
Apache License 2.0
13.02k stars 4.19k forks source link

Rust BPF: Panic investigation #5520

Closed jstarry closed 5 years ago

jstarry commented 5 years ago

Problem

I encountered a panic while using a modified serde_json crate to serialize a Vec into account data. Without modifications to the bpf vm, the panic doesn't crash the node. However, if I increase the max call depth of the VM, the node fully crashes when the panic occurs. You can see from the two stacktraces below that the panics are different. Happy to split this into 2 separate issues if necessary!

Recoverable Panic Repro

  1. Check out the rbpf-panic branch in https://github.com/solana-labs/example-messagefeed
  2. npm install && npm run build:bpf-rust
  3. npm run localnet:update && npm run localnet:up
  4. npm run start-server

** Note that the bpf program is really big so it takes awhile to load.

Node Crash Panic Repro

  1. Update the max call depth in solana-rbpf, I bumped from 10 to 50 here
    solana_rbpf = { git = "https://github.com/jstarry/rbpf", branch = "increase-call-depth" }`
  2. Run the multinode-demo with a bootstrap leader and drone
  3. Repeat the same steps as the recoverable panic ^ (without the localnet)

Env

rustc 1.36.0

Recoverable panic log

[2019-08-14T13:34:55.112221900Z INFO  solana_bpf_loader_api] Finalize: account 8sWUHySJouxYaXt3NotYmaQBu9qbk5YnCYZVyRejBf2c
[2019-08-14T13:34:55.250360300Z INFO  solana_bpf_loader_api] Call BPF program
!! Skipped relocation section 2 target_offset 2880 va 514c0 Referenced va (0))
[2019-08-14T13:34:55.256186700Z INFO  solana_bpf_loader_api::helpers] info: "init_collection"
[2019-08-14T13:34:55.256247600Z INFO  solana_bpf_loader_api::helpers] info: "CollectionType::default()"
[2019-08-14T13:34:55.256375900Z INFO  solana_bpf_loader_api::helpers] info: "serde_json::to_vec"
thread '<unnamed>' panicked at 'internal error: entered unreachable code', /home/.cargo/registry/src/github.com-1ecc6299db9ec823/solana_rbpf-0.1.13/src/lib.rs:888:9
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:39
   1: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at src/libstd/sys_common/backtrace.rs:59
             at src/libstd/panicking.rs:197
   3: std::panicking::default_hook
             at src/libstd/panicking.rs:211
   4: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:474
   5: std::panicking::begin_panic
   6: solana_rbpf::EbpfVmMbuff::execute_program
   7: solana_rbpf::EbpfVmRaw::execute_program
   8: solana_bpf_loader_api::process_instruction
   9: process
  10: solana_runtime::message_processor::MessageProcessor::process_message
  11: <core::iter::adapters::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
  12: <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T,I>>::from_iter
  13: solana_runtime::bank::Bank::load_and_execute_transactions
  14: solana::banking_stage::BankingStage::process_and_record_transactions
  15: solana::banking_stage::BankingStage::process_received_packets
  16: solana::banking_stage::BankingStage::process_packets
  17: solana::banking_stage::BankingStage::process_loop

Node crash log

[2019-08-14T13:15:10.133002000Z INFO  solana_bpf_loader_api] Finalize: account 6n4CzHvqC6GsMEZdk5PanUWKiEq7k6Ed7yydYqSFjNND
[2019-08-14T13:15:10.684212000Z INFO  solana_bpf_loader_api] Call BPF program
!! Skipped relocation section 2 target_offset 112 va 509f0 Referenced va (0))
!! Skipped relocation section 2 target_offset 256 va 50a80 Referenced va (0))
!! Skipped relocation section 2 target_offset 368 va 50af0 Referenced va (0))
!! Skipped relocation section 2 target_offset 440 va 50b38 Referenced va (0))
!! Skipped relocation section 2 target_offset 904 va 50d08 Referenced va (0))
!! Skipped relocation section 2 target_offset 2880 va 514c0 Referenced va (0))
[2019-08-14T13:15:10.729180000Z INFO  solana_bpf_loader_api::helpers] info: "init_collection"
[2019-08-14T13:15:10.729230000Z INFO  solana_bpf_loader_api::helpers] info: "CollectionType::default()"
[2019-08-14T13:15:10.729268000Z INFO  solana_bpf_loader_api::helpers] info: "serde_json::to_vec"
thread '<unnamed>' panicked at 'attempt to multiply with overflow', /Users/starry/.cargo/registry/src/github.com-1ecc6299db9ec823/solana_rbpf-0.1.13/src/lib.rs:568:15
stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: solana_runtime::native_loader::invoke_entrypoint
             at runtime/src/native_loader.rs:104
  12: solana_runtime::message_processor::MessageProcessor::process_instruction
             at runtime/src/message_processor.rs:224
  13: solana_runtime::message_processor::MessageProcessor::execute_instruction
             at runtime/src/message_processor.rs:257
  14: solana_runtime::message_processor::MessageProcessor::process_message
             at runtime/src/message_processor.rs:317
  15: solana_runtime::bank::Bank::load_and_execute_transactions::{{closure}}
             at runtime/src/bank.rs:981
  16: <core::iter::adapters::Map<I,F> as core::iter::traits::iterator::Iterator>::fold::{{closure}}
             at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libcore/iter/adapters/mod.rs:589
  17: core::iter::traits::iterator::Iterator::fold::{{closure}}
             at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libcore/iter/traits/iterator.rs:1684
  18: core::iter::traits::iterator::Iterator::try_fold
             at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libcore/iter/traits/iterator.rs:1572
  19: core::iter::traits::iterator::Iterator::fold
             at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libcore/iter/traits/iterator.rs:1684
  20: <core::iter::adapters::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
             at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libcore/iter/adapters/mod.rs:589
  21: core::iter::traits::iterator::Iterator::for_each
             at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libcore/iter/traits/iterator.rs:604
  22: <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T,I>>::spec_extend
             at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/liballoc/vec.rs:1862
  23: <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T,I>>::from_iter
             at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/liballoc/vec.rs:1845
  24: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/liballoc/vec.rs:1731
  25: core::iter::traits::iterator::Iterator::collect
             at /rustc/a53f9df32fbb0b5f4382caaad8f1a46f36ea887c/src/libcore/iter/traits/iterator.rs:1465
  26: solana_runtime::bank::Bank::load_and_execute_transactions
             at runtime/src/bank.rs:974
  27: solana::banking_stage::BankingStage::process_and_record_transactions_locked
             at core/src/banking_stage.rs:491
  28: solana::banking_stage::BankingStage::process_and_record_transactions
             at core/src/banking_stage.rs:547
  29: solana::banking_stage::BankingStage::process_transactions
             at core/src/banking_stage.rs:585
  30: solana::banking_stage::BankingStage::process_received_packets
             at core/src/banking_stage.rs:728
  31: solana::banking_stage::BankingStage::process_packets
             at core/src/banking_stage.rs:836
  32: solana::banking_stage::BankingStage::process_loop
             at core/src/banking_stage.rs:375
  33: solana::banking_stage::BankingStage::new_num_threads::{{closure}}::{{closure}}
             at core/src/banking_stage.rs:111
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
[1]    73974 illegal hardware instruction  ./bootstrap-leader.sh
jackcmay commented 5 years ago

Thanks, Justin, great information! Both cases look similar like there is a jump outside of the program. The crash scenario looks like an overflow because the pc is so large.

jackcmay commented 5 years ago

Looks like original vm didn't expect large programs and so jump calculations were done with i16 types. The following pull request fixes the issue:

https://github.com/solana-labs/rbpf/pull/34

I will close the issue once Solana picks up the latest solana_rbpf.

The use case that identified this issue was the use of serde_json in bpf programs. serde_json is huge (pulling in 1serde_json::from_slice1` increases the elf size by ~400k. This led to very large programs that included jumps exceeding 32k. serde_json is probably not the best choice for on-chain programs.

jstarry commented 5 years ago

Sounds great, thanks for investigating! Agreed about serde_json not being a good choice. In the future it would be great to have a low-cost solution for serializing data!

jackcmay commented 5 years ago

Fixed in #5609