Open jackcmay opened 3 years ago
Here is the plan:
Account
struct (probably somewhere around here) which places all accounts loaded in a TransactionBatch
in a consecutive memory range.MemoryRegion
from a slice of this accounts array into the BPF VM, so that only the accounts belonging to that transaction are inside the slice.MemoryRegion
, but keep the serialization interface.Account
struct to be repr(C)
everywhere@Lichtso Thanks for summarizing our discussion, I have broken out some of the action items described in this issue: https://github.com/solana-labs/solana/issues/14691 https://github.com/solana-labs/solana/issues/14692 https://github.com/solana-labs/solana/issues/14693 https://github.com/solana-labs/solana/issues/14694 https://github.com/solana-labs/solana/issues/14695 https://github.com/solana-labs/solana/issues/14696
You might know this already. But mem copies in the transaction execution path is really hurting performance. I found this by casually perf
-ing for reviewing #14596 .
Currently, supposedly large account data consumes more cpu cycles just for moving/copying them. And, interpreted bpf execution only consumes ~6%
Note that _blake3_hash_many_avx512 will be Account's DB problem..
From https://github.com/solana-labs/solana/pull/14596#issuecomment-764598898:
$ sudo perf top -t $(echo $(ps -e -T | grep blockstore_proc | awk '{print $2}') | sed 's/ /,/g') --proc-map-timeout 10000 Samples: 165K of event 'cpu-clock:pppH', 4000 Hz, Event count (approx.): 14739498376 lost: 0/0 drop: 0/0 Overhead Shared Object Symbol 42.74% libc-2.31.so [.] __memmove_avx_unaligned_erms 8.30% solana-validator [.] _blake3_hash_many_avx512 4.45% solana-validator [.] solana_rbpf::vm::EbpfVm<E,I>::execute_program_interpreted_inner 3.85% libc-2.31.so [.] __sched_yield 2.87% libc-2.31.so [.] _int_malloc 2.40% libc-2.31.so [.] __memset_avx2_erms 2.40% solana-validator [.] solana_rbpf::memory_region::MemoryMapping::map 2.13% solana-validator [.] solana_rbpf::ebpf::get_insn_unchecked 1.88% [kernel] [k] __lock_text_start 1.85% [kernel] [k] __sched_text_start 1.66% [kernel] [k] do_syscall_64 1.58% solana-validator [.] <&mut bincode::de::Deserializer<R,O> as serde::de::Deserializer>::deserialize_tuple 1.09% solana-validator [.] <crossbeam_epoch::sync::list::Iter<T,C> as core::iter::traits::iterator::Iterator>::next 1.05% solana-validator [.] alloc::collections::btree::search::search_node 0.94% [kernel] [k] finish_task_switch 0.86% libc-2.31.so [.] malloc 0.79% solana-validator [.] <<&mut bincode::de::Deserializer<R,O> as serde::de::Deserializer>::deserialize_tuple::Access<R,O> as serde::de::SeqAccess>::next_element_seed 0.71% libc-2.31.so [.] __memcmp_avx2_movbe 0.63% solana-validator [.] _blake3_compress_in_place_avx512 0.61% solana-validator [.] bs58::encode::encode_into 0.58% solana-validator [.] solana_runtime::accounts_index::AccountsIndex<T>::get_account_read_entry 0.58% libpthread-2.31.so [.] __pthread_rwlock_rdlock 0.55% solana-validator [.] bs58::encode::encode_into 0.54% solana-validator [.] <solana_rbpf::elf::EBpfElf<E,I> as solana_rbpf::vm::Executable<E,I>>::lookup_bpf_function 0.51% libc-2.31.so [.] malloc_consolidate
... So, I'm really looking forward to the improvements in this area. :)
I would hope this is largely addressed by the copy on write account data change that went in a few weeks ago. @ryoqun , if it is possible for you to run this again, that would be swell. or better, if you want to point me to how to run like this. Do you have to build differently? I've not yet run perf on our stack. @ryoqun
I think he used this to connect while it is already running:
sudo perf top -t $(echo $(ps -e -T | grep blockstore_proc | awk '{print $2}') | sed 's/ /,/g') --proc-map-timeout 10000
If you want to start recording from the beginning:
sudo perf record -F99 --call-graph dwarf target/release/solana-ledger-tool
or Intel:
sudo perf record -F99 --call-graph lbr target/release/solana-ledger-tool
can make a flame graph: https://github.com/sakridge/unix-home/blob/master/bin/make-flame-graph.sh
Has a hardcoded path of https://github.com/brendangregg/FlameGraph in ~/src/FlameGraph
here's updated perf top
, seeing progress. (kudos to @jeffwashington ) :) This is very casual check with recent master, comparing to this: https://github.com/solana-labs/solana/issues/14523#issuecomment-765800757
_blake3_hash_many_avx512
disappearedexecute_program_interpreted_inner
reduced and solana_rbpf::memory_region::MemoryMapping::map
increased. maybe some code shifting? (maybe @Lichtso can have some clue?)Samples: 156K of event 'cpu-clock:pppH', 4000 Hz, Event count (approx.): 5549052704 lost: 0/0 drop: 0/0
Overhead Shared Object Symbol
15.75% libc-2.31.so [.] __memmove_avx_unaligned_erms
8.90% solana-validator-master-accounts-cache-race-v2 [.] <&mut bincode::de::Deserializer<R,O> as serde::de::Deserializer>::deserialize_tuple
7.89% libc-2.31.so [.] __memcmp_avx2_movbe
5.88% solana-validator-master-accounts-cache-race-v2 [.] solana_rbpf::memory_region::MemoryMapping::map
4.58% [kernel] [k] __lock_text_start
3.70% libc-2.31.so [.] _int_malloc
3.20% libc-2.31.so [.] __sched_yield
3.11% libc-2.31.so [.] __memset_avx2_erms
3.01% solana-validator-master-accounts-cache-race-v2 [.] <<&mut bincode::de::Deserializer<R,O> as serde::de::Deserializer>::deserialize_tuple::Access<R,O> as serde::de::SeqAccess>::next_element_seed
2.23% solana-validator-master-accounts-cache-race-v2 [.] alloc::collections::btree::search::<impl alloc::collections::btree::node::NodeRef<BorrowType,K,V,alloc::collections::btree::node::marker::LeafOrInternal>>::search_tree
2.05% [kernel] [k] __sched_text_start
1.86% solana-validator-master-accounts-cache-race-v2 [.] <crossbeam_epoch::sync::list::Iter<T,C> as core::iter::traits::iterator::Iterator>::next
1.50% solana-validator-master-accounts-cache-race-v2 [.] solana_rbpf::vm::EbpfVm<E,I>::execute_program_interpreted_inner
1.46% libc-2.31.so [.] malloc
1.32% solana-validator-master-accounts-cache-race-v2 [.] <serde::de::impls::<impl serde::de::Deserialize for alloc::vec::Vec<T>>::deserialize::VecVisitor<T> as serde::de::Visitor>::visit_seq
1.20% solana-validator-master-accounts-cache-race-v2 [.] bs58::encode::encode_into
0.83% solana-validator-master-accounts-cache-race-v2 [.] crossbeam_epoch::default::pin
0.75% solana-validator-master-accounts-cache-race-v2 [.] solana_rbpf::ebpf::get_insn_unchecked
0.74% [kernel] [k] do_syscall_64
0.65% [kernel] [k] finish_task_switch
0.65% solana-validator-master-accounts-cache-race-v2 [.] <dashmap::DashMap<K,V,S> as dashmap::t::Map<K,V,S>>::_get
0.63% libc-2.31.so [.] unlink_chunk.isra.0
0.61% libpthread-2.31.so [.] __pthread_rwlock_rdlock
0.60% solana-validator-master-accounts-cache-race-v2 [.] <&mut bincode::ser::Serializer<W,O> as serde::ser::Serializer>::serialize_newtype_struct
0.59% solana-validator-master-accounts-cache-race-v2 [.] crossbeam_deque::deque::Stealer<T>::steal
0.58% libc-2.31.so [.] _int_free
0.58% libc-2.31.so [.] cfree@GLIBC_2.2.5
0.56% solana-validator-master-accounts-cache-race-v2 [.] <std::collections::hash::map::DefaultHasher as core::hash::Hasher>::write
0.56% [kernel] [k] do_sched_yield
0.53% solana-validator-master-accounts-cache-race-v2 [.] solana_runtime::accounts_index::AccountsIndex<T>::latest_slot
0.48% solana-validator-master-accounts-cache-race-v2 [.] solana_runtime::accounts_index::AccountsIndex<T>::get_account_read_entry
0.42% libc-2.31.so [.] malloc_consolidate
0.35% solana-validator-master-accounts-cache-race-v2 [.] crossbeam_channel::waker::SyncWaker::notify
0.35% solana-validator-master-accounts-cache-race-v2 [.] <core::iter::adapters::chain::Chain<A,B> as core::iter::traits::iterator::Iterator>::try_fold
0.35% solana-validator-master-accounts-cache-race-v2 [.] solana_runtime::accounts_cache::AccountsCache::load
0.34% solana-validator-master-accounts-cache-race-v2 [.] rayon_core::registry::WorkerThread::wait_until_cold
0.34% solana-validator-master-accounts-cache-race-v2 [.] serde::de::SeqAccess::next_element
0.32% solana-validator-master-accounts-cache-race-v2 [.] hashbrown::map::make_hash
0.32% solana-validator-master-accounts-cache-race-v2 [.] core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
0.31% solana-validator-master-accounts-cache-race-v2 [.] solana_vote_program::vote_state::VoteState::process_vote
0.30% solana-validator-master-accounts-cache-race-v2 [.] solana_runtime::accounts_db::LoadedAccount::take_account
I suspect blake3_hash disappearing has at least 2 known causes:
I suspect execute_program_interpreted_inner
went down as we switched to JIT being enabled by default.
Though, I have no idea why solana_rbpf::memory_region::MemoryMapping::map
increased so much.
What were the versions (especially of solana_rbpf
) when these measurements took place?
0.53% [.] solana_runtime::accounts_index::AccountsIndex<T>::latest_slot
latest_slot is being attacked by an already-submitted pr described in this issue. @ryoqun ran into the assert gathering these metrics today and had to remove the pr that should have greatly reduced this %.
What were the versions (especially of solana_rbpf) when these measurements took place?
@Lichtso Hmm, unfortunately I don't have precise build info. I only compared between the tip of master around Jan 21 and one around Apr 16. This is very informal test so I don't want to make you worry too much. :) If you're interested, I can run this test between tip of master and v1.5.18. Also, the on-chain tx composition can change the perf metrics a lot between the time. So, I'm just looking this for fun and to ensure nothing so obviously bad is happening on tip of master. Anyway, I'm hoping your massive work https://github.com/solana-labs/solana/pull/15410 will improve this perf numbers a lot. :)
@Lichtso Hmm, unfortunately I don't have precise build info. I only compared between the tip of master around Jan 21 and one around Apr 16. This is very informal test so I don't want to make you worry too much. :) If you're interested, I can run this test between tip of master and v1.5.18. Also, the on-chain tx composition can change the perf metrics a lot between the time. So, I'm just looking this for fun and to ensure nothing so obviously bad is happening on tip of master. Anyway, I'm hoping your massive work #15410 will improve this perf numbers a lot. :)
Thx, but now that I looked at the data again I think solana_rbpf::memory_region::MemoryMapping::map
has not changed at all (in absolute terms). Everything else got better, so now it takes more of the cake in relative terms. If you take a look at the ratio of that and the next rank you will see what I mean:
2.40% solana-validator [.] solana_rbpf::memory_region::MemoryMapping::map
1.88% [kernel] [k] __lock_text_start
Ratio: 1.27659574468
5.88% solana-validator-master-accounts-cache-race-v2 [.] solana_rbpf::memory_region::MemoryMapping::map
4.58% [kernel] [k] __lock_text_start
Ratio: 1.28384279476
Also, the refactoring I am working on shouldn't change much about the performance if at all. However, it is a requirement for some of the planned more advanced optimizations.
To summarize how this progressed during the past half year:
accounts
metadata (KeyedAccount
) used by the VM in once place (the InvokeContext
)account_deps
special cases and restricted the use of pre_accounts
to validation only, in turn rendering them obsolete once accounts can be write-protected by memory mapping.accounts
directly from the DB without copying them into a consecutive buffer first, by mapping each account
to its own VM MemoryRegion
.@jackcmay, @jeffwashington, @aeyakovenko and I just had a recap discussion about all the related / connected issues and came up with the following design goals:
repr(C)
and using correct mut and const attributes (#14692).RefCell
, maybe an index (indirection) to account mapping.MemoryRegions
in the VM (#14691).
MemoryRegion
, referencing accounts with translated pointers.MemoryRegion
each.preaccounts
(copy and validation) because MemoryRegions
can be marked read-only.About half of these ideas existed already and I linked the tracking IDs. I will try to come up with a concrete proposal for how the new interface should look like by next week and post it here as request for comments.
In addition, I think we can also address the following issue as well when we roll out the new loader that doesn't rely on serialization: https://github.com/solana-labs/solana/issues/9711
Now that programs no longer support returning errors "gracefully" (https://github.com/solana-labs/solana/pull/14500), the way data flows to and through a cross-program invocation call chain also changes and may no longer require serialization or extra copies.
There is no longer any roll-back or intermediate discarding of data. Account data now flows through the call chain linearly from one program to the next, either to the end of the chain upon success or up until an error where the data is discarded. Because of this, rather than copying data in and out of each program's environment, the data can be passed (and translated) via pointers. Each program in the call chain incrementally operates on the data, and the runtime verifies and snapshots in-between and at the end.
Doing so provides the following advantages:
But comes with the following challenges:
repr(C)
). We already have this issue when passingInstruction
andAccountIInfo
viaInvoke
. Newrepr(C)
types should be used for all data passed into and out of the programs. This means the current native entrypoint (usingKeyedAccounts
is not well suited.repr(C)
we should probably make that change all the way down toaccounts
so that we don't incur any extra copies of the account data then are absolutely necessary.Overall I think this proposal does a lot for both performance, stability, and clarity of the data being passed to and between programs.