solana-labs / solana

Web-Scale Blockchain for fast, secure, scalable, decentralized apps and marketplaces.
https://solanalabs.com
Apache License 2.0
13.02k stars 4.19k forks source link

BPF program serialization/deserialization may no longer be necessary #14523

Open jackcmay opened 3 years ago

jackcmay commented 3 years ago

Now that programs no longer support returning errors "gracefully" (https://github.com/solana-labs/solana/pull/14500), the way data flows to and through a cross-program invocation call chain also changes and may no longer require serialization or extra copies.

There is no longer any roll-back or intermediate discarding of data. Account data now flows through the call chain linearly from one program to the next, either to the end of the chain upon success or up until an error where the data is discarded. Because of this, rather than copying data in and out of each program's environment, the data can be passed (and translated) via pointers. Each program in the call chain incrementally operates on the data, and the runtime verifies and snapshots in-between and at the end.

Doing so provides the following advantages:

But comes with the following challenges:

Overall I think this proposal does a lot for both performance, stability, and clarity of the data being passed to and between programs.

Lichtso commented 3 years ago

Here is the plan:

  1. Build a custom allocator for the Account struct (probably somewhere around here) which places all accounts loaded in a TransactionBatch in a consecutive memory range.
  2. Let each transaction create a BPF VM MemoryRegion from a slice of this accounts array into the BPF VM, so that only the accounts belonging to that transaction are inside the slice.
  3. Replace the account data copy steps by using the MemoryRegion, but keep the serialization interface.
  4. Then redesign the BPF program interface to get rid of the de-/serialization.
    • Fix the immutable problem
    • Allow for better realloc of account data
    • Unify entrypoints of BPF and native programs
  5. Refactor the Account struct to be repr(C) everywhere
jackcmay commented 3 years ago

@Lichtso Thanks for summarizing our discussion, I have broken out some of the action items described in this issue: https://github.com/solana-labs/solana/issues/14691 https://github.com/solana-labs/solana/issues/14692 https://github.com/solana-labs/solana/issues/14693 https://github.com/solana-labs/solana/issues/14694 https://github.com/solana-labs/solana/issues/14695 https://github.com/solana-labs/solana/issues/14696

ryoqun commented 3 years ago

You might know this already. But mem copies in the transaction execution path is really hurting performance. I found this by casually perf-ing for reviewing #14596 .

Currently, supposedly large account data consumes more cpu cycles just for moving/copying them. And, interpreted bpf execution only consumes ~6%

Note that _blake3_hash_many_avx512 will be Account's DB problem..

From https://github.com/solana-labs/solana/pull/14596#issuecomment-764598898:

$ sudo perf top -t $(echo $(ps -e -T | grep blockstore_proc | awk '{print $2}') | sed 's/ /,/g') --proc-map-timeout 10000

Samples: 165K of event 'cpu-clock:pppH', 4000 Hz, Event count (approx.): 14739498376 lost: 0/0 drop: 0/0
Overhead  Shared Object       Symbol
  42.74%  libc-2.31.so        [.] __memmove_avx_unaligned_erms
   8.30%  solana-validator    [.] _blake3_hash_many_avx512
   4.45%  solana-validator    [.] solana_rbpf::vm::EbpfVm<E,I>::execute_program_interpreted_inner
   3.85%  libc-2.31.so        [.] __sched_yield
   2.87%  libc-2.31.so        [.] _int_malloc
   2.40%  libc-2.31.so        [.] __memset_avx2_erms
   2.40%  solana-validator    [.] solana_rbpf::memory_region::MemoryMapping::map
   2.13%  solana-validator    [.] solana_rbpf::ebpf::get_insn_unchecked
   1.88%  [kernel]            [k] __lock_text_start
   1.85%  [kernel]            [k] __sched_text_start
   1.66%  [kernel]            [k] do_syscall_64
   1.58%  solana-validator    [.] <&mut bincode::de::Deserializer<R,O> as serde::de::Deserializer>::deserialize_tuple
   1.09%  solana-validator    [.] <crossbeam_epoch::sync::list::Iter<T,C> as core::iter::traits::iterator::Iterator>::next
   1.05%  solana-validator    [.] alloc::collections::btree::search::search_node
   0.94%  [kernel]            [k] finish_task_switch
   0.86%  libc-2.31.so        [.] malloc
   0.79%  solana-validator    [.] <<&mut bincode::de::Deserializer<R,O> as serde::de::Deserializer>::deserialize_tuple::Access<R,O> as serde::de::SeqAccess>::next_element_seed
   0.71%  libc-2.31.so        [.] __memcmp_avx2_movbe
   0.63%  solana-validator    [.] _blake3_compress_in_place_avx512
   0.61%  solana-validator    [.] bs58::encode::encode_into
   0.58%  solana-validator    [.] solana_runtime::accounts_index::AccountsIndex<T>::get_account_read_entry
   0.58%  libpthread-2.31.so  [.] __pthread_rwlock_rdlock
   0.55%  solana-validator    [.] bs58::encode::encode_into
   0.54%  solana-validator    [.] <solana_rbpf::elf::EBpfElf<E,I> as solana_rbpf::vm::Executable<E,I>>::lookup_bpf_function
   0.51%  libc-2.31.so        [.] malloc_consolidate

... So, I'm really looking forward to the improvements in this area. :)

jeffwashington commented 3 years ago

I would hope this is largely addressed by the copy on write account data change that went in a few weeks ago. @ryoqun , if it is possible for you to run this again, that would be swell. or better, if you want to point me to how to run like this. Do you have to build differently? I've not yet run perf on our stack. @ryoqun

sakridge commented 3 years ago

I think he used this to connect while it is already running: sudo perf top -t $(echo $(ps -e -T | grep blockstore_proc | awk '{print $2}') | sed 's/ /,/g') --proc-map-timeout 10000

If you want to start recording from the beginning:

sudo perf record -F99 --call-graph dwarf target/release/solana-ledger-tool

or Intel:

sudo perf record -F99 --call-graph lbr target/release/solana-ledger-tool

can make a flame graph: https://github.com/sakridge/unix-home/blob/master/bin/make-flame-graph.sh

Has a hardcoded path of https://github.com/brendangregg/FlameGraph in ~/src/FlameGraph

ryoqun commented 3 years ago

here's updated perf top, seeing progress. (kudos to @jeffwashington ) :) This is very casual check with recent master, comparing to this: https://github.com/solana-labs/solana/issues/14523#issuecomment-765800757

Samples: 156K of event 'cpu-clock:pppH', 4000 Hz, Event count (approx.): 5549052704 lost: 0/0 drop: 0/0                                                                                                                                  
Overhead  Shared Object                                   Symbol                                                                                                                                                                         
  15.75%  libc-2.31.so                                    [.] __memmove_avx_unaligned_erms                                                       
   8.90%  solana-validator-master-accounts-cache-race-v2  [.] <&mut bincode::de::Deserializer<R,O> as serde::de::Deserializer>::deserialize_tuple
   7.89%  libc-2.31.so                                    [.] __memcmp_avx2_movbe                                                                
   5.88%  solana-validator-master-accounts-cache-race-v2  [.] solana_rbpf::memory_region::MemoryMapping::map
   4.58%  [kernel]                                        [k] __lock_text_start                             
   3.70%  libc-2.31.so                                    [.] _int_malloc                                                                                                                                  
   3.20%  libc-2.31.so                                    [.] __sched_yield                                                                                                                                
   3.11%  libc-2.31.so                                    [.] __memset_avx2_erms                                                                                                                           
   3.01%  solana-validator-master-accounts-cache-race-v2  [.] <<&mut bincode::de::Deserializer<R,O> as serde::de::Deserializer>::deserialize_tuple::Access<R,O> as serde::de::SeqAccess>::next_element_seed                          
   2.23%  solana-validator-master-accounts-cache-race-v2  [.] alloc::collections::btree::search::<impl alloc::collections::btree::node::NodeRef<BorrowType,K,V,alloc::collections::btree::node::marker::LeafOrInternal>>::search_tree
   2.05%  [kernel]                                        [k] __sched_text_start                                                                                                                                                     
   1.86%  solana-validator-master-accounts-cache-race-v2  [.] <crossbeam_epoch::sync::list::Iter<T,C> as core::iter::traits::iterator::Iterator>::next
   1.50%  solana-validator-master-accounts-cache-race-v2  [.] solana_rbpf::vm::EbpfVm<E,I>::execute_program_interpreted_inner                                                                      
   1.46%  libc-2.31.so                                    [.] malloc                                                                                                                               
   1.32%  solana-validator-master-accounts-cache-race-v2  [.] <serde::de::impls::<impl serde::de::Deserialize for alloc::vec::Vec<T>>::deserialize::VecVisitor<T> as serde::de::Visitor>::visit_seq
   1.20%  solana-validator-master-accounts-cache-race-v2  [.] bs58::encode::encode_into                                                                                                            
   0.83%  solana-validator-master-accounts-cache-race-v2  [.] crossbeam_epoch::default::pin                                                           
   0.75%  solana-validator-master-accounts-cache-race-v2  [.] solana_rbpf::ebpf::get_insn_unchecked                                                                                                                                      
   0.74%  [kernel]                                        [k] do_syscall_64                                                            
   0.65%  [kernel]                                        [k] finish_task_switch                                                                      
   0.65%  solana-validator-master-accounts-cache-race-v2  [.] <dashmap::DashMap<K,V,S> as dashmap::t::Map<K,V,S>>::_get                                                                                                                  
   0.63%  libc-2.31.so                                    [.] unlink_chunk.isra.0                                                                                                                                                        
   0.61%  libpthread-2.31.so                              [.] __pthread_rwlock_rdlock                                                                                                                                                    
   0.60%  solana-validator-master-accounts-cache-race-v2  [.] <&mut bincode::ser::Serializer<W,O> as serde::ser::Serializer>::serialize_newtype_struct                                                                                   
   0.59%  solana-validator-master-accounts-cache-race-v2  [.] crossbeam_deque::deque::Stealer<T>::steal                                                                                                                                  
   0.58%  libc-2.31.so                                    [.] _int_free                                                                                                                                                                  
   0.58%  libc-2.31.so                                    [.] cfree@GLIBC_2.2.5                                                                                                                                                          
   0.56%  solana-validator-master-accounts-cache-race-v2  [.] <std::collections::hash::map::DefaultHasher as core::hash::Hasher>::write                                                                                                  
   0.56%  [kernel]                                        [k] do_sched_yield                                                                                                                                                             
   0.53%  solana-validator-master-accounts-cache-race-v2  [.] solana_runtime::accounts_index::AccountsIndex<T>::latest_slot                                                                                                              
   0.48%  solana-validator-master-accounts-cache-race-v2  [.] solana_runtime::accounts_index::AccountsIndex<T>::get_account_read_entry                     
   0.42%  libc-2.31.so                                    [.] malloc_consolidate                                                                                                                                                         
   0.35%  solana-validator-master-accounts-cache-race-v2  [.] crossbeam_channel::waker::SyncWaker::notify                                                                                                                                
   0.35%  solana-validator-master-accounts-cache-race-v2  [.] <core::iter::adapters::chain::Chain<A,B> as core::iter::traits::iterator::Iterator>::try_fold                                                                              
   0.35%  solana-validator-master-accounts-cache-race-v2  [.] solana_runtime::accounts_cache::AccountsCache::load                                                                                                                        
   0.34%  solana-validator-master-accounts-cache-race-v2  [.] rayon_core::registry::WorkerThread::wait_until_cold                                                                                                                        
   0.34%  solana-validator-master-accounts-cache-race-v2  [.] serde::de::SeqAccess::next_element                                                                                                                                         
   0.32%  solana-validator-master-accounts-cache-race-v2  [.] hashbrown::map::make_hash                                                                                                                                                  
   0.32%  solana-validator-master-accounts-cache-race-v2  [.] core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once                                                                                    
   0.31%  solana-validator-master-accounts-cache-race-v2  [.] solana_vote_program::vote_state::VoteState::process_vote                                                                             
   0.30%  solana-validator-master-accounts-cache-race-v2  [.] solana_runtime::accounts_db::LoadedAccount::take_account                               
jeffwashington commented 3 years ago

I suspect blake3_hash disappearing has at least 2 known causes:

  1. @brooksprumo renamed it ;-)
  2. this pr moved account hashing to the bg and only for the most recent store of an account. There are still a few places where hashing shows up in the metrics. I knew this pr improved experienced clock time. I hadn't seen it as a profiler line item disappear. This is cool.
Lichtso commented 3 years ago

I suspect execute_program_interpreted_inner went down as we switched to JIT being enabled by default. Though, I have no idea why solana_rbpf::memory_region::MemoryMapping::map increased so much. What were the versions (especially of solana_rbpf) when these measurements took place?

jeffwashington commented 3 years ago
   0.53%  [.] solana_runtime::accounts_index::AccountsIndex<T>::latest_slot                                                                                                              

latest_slot is being attacked by an already-submitted pr described in this issue. @ryoqun ran into the assert gathering these metrics today and had to remove the pr that should have greatly reduced this %.

ryoqun commented 3 years ago

What were the versions (especially of solana_rbpf) when these measurements took place?

@Lichtso Hmm, unfortunately I don't have precise build info. I only compared between the tip of master around Jan 21 and one around Apr 16. This is very informal test so I don't want to make you worry too much. :) If you're interested, I can run this test between tip of master and v1.5.18. Also, the on-chain tx composition can change the perf metrics a lot between the time. So, I'm just looking this for fun and to ensure nothing so obviously bad is happening on tip of master. Anyway, I'm hoping your massive work https://github.com/solana-labs/solana/pull/15410 will improve this perf numbers a lot. :)

Lichtso commented 3 years ago

@Lichtso Hmm, unfortunately I don't have precise build info. I only compared between the tip of master around Jan 21 and one around Apr 16. This is very informal test so I don't want to make you worry too much. :) If you're interested, I can run this test between tip of master and v1.5.18. Also, the on-chain tx composition can change the perf metrics a lot between the time. So, I'm just looking this for fun and to ensure nothing so obviously bad is happening on tip of master. Anyway, I'm hoping your massive work #15410 will improve this perf numbers a lot. :)

Thx, but now that I looked at the data again I think solana_rbpf::memory_region::MemoryMapping::map has not changed at all (in absolute terms). Everything else got better, so now it takes more of the cake in relative terms. If you take a look at the ratio of that and the next rank you will see what I mean:

2.40%  solana-validator    [.] solana_rbpf::memory_region::MemoryMapping::map
1.88%  [kernel]            [k] __lock_text_start
Ratio: 1.27659574468

5.88%  solana-validator-master-accounts-cache-race-v2  [.] solana_rbpf::memory_region::MemoryMapping::map
4.58%  [kernel]                                        [k] __lock_text_start
Ratio: 1.28384279476

Also, the refactoring I am working on shouldn't change much about the performance if at all. However, it is a requirement for some of the planned more advanced optimizations.

Lichtso commented 3 years ago

To summarize how this progressed during the past half year:

Lichtso commented 3 years ago

@jackcmay, @jeffwashington, @aeyakovenko and I just had a recap discussion about all the related / connected issues and came up with the following design goals:

About half of these ideas existed already and I linked the tracking IDs. I will try to come up with a concrete proposal for how the new interface should look like by next week and post it here as request for comments.

jackcmay commented 3 years ago

In addition, I think we can also address the following issue as well when we roll out the new loader that doesn't rely on serialization: https://github.com/solana-labs/solana/issues/9711