Open cylewitruk opened 8 months ago
The current hotpath for the BTreeMaps in block execution seems to be TupleData::from_data()
. This performs a lot of random reads, inserts, and reallocs.
I think before considering replacing the BTreeMap structs (which are not pure wins: serialization would be slower!), we should be benchmarking improvements to that hotpath, e.g.:
pub fn from_data(data: Vec<(ClarityName, Value)>) -> Result<TupleData> {
let type_map = data.iter().map(|(name, value)| {
Ok((name.clone(), TypeSignature::type_of(value)?))
}).collect::<Result<BTreeMap<_, _>>>()?;
if type_map.len() != data.len() {
// some name was already used but we don't know which... can iterate over data to find it if necessary.
return Err(CheckErrors::NameAlreadyUsed("Unclear!".into()).into());
}
let data_map = data.into_iter().collect();
Self::new(TupleTypeSignature::try_from(type_map)?, data_map)
}
I think before considering replacing the BTreeMap structs (which are not pure wins: serialization would be slower!), we should be benchmarking improvements to that hotpath
Good input with the example! :+1: I personally feel that both of these are small enough to pursue in parallel and bench -- both shouldn't take more than a couple hours tbh, and I'd love to see the results of both.
Then we can re-awaken the discussion regarding risk of potential hash collisions here as well before anything gets merged :pray:
@kantai I have an open PR on that, #4458
Also, I've tried using the collect::<Result<_<_,_>>>()?
pattern a few times. It doesn't optimize well, and is noticeably slower than a for
loop. Rust iterators don't always optimize as well as you'd expect, flat_map()
is another example. Maybe because those can't take advantage of size_hints()
?
@kantai I have an open PR on that, https://github.com/stacks-network/stacks-core/pull/4458
Also, I've tried using the collect::<Result<<,_>>>()? pattern a few times. It doesn't optimize well, and is noticeably slower than a for loop. Rust iterators don't always optimize as well as you'd expect, flat_map() is another example. Maybe because those can't take advantage of size_hints()?
Did you try collect()
here? It seems like BTreeMap FromIterator
should be much more efficient than repeated insert
s: it presorts the insertion and does a bulk insert, whereas even .entry() -> .insert()
actually incurs the costs of self-balancing each entry.
Did you try collect() here?
I'm not sure if I did, but I can check again, and I can check the source of BTreeMap::FromIterator()
to see what it's actually doing
@kantai it's slower using .collect()
:
❯ hyperfine -w 3 -r 20 "target/release/stacks-inspect.opt-tuple-data replay-block /home/jbencin/data/next/ range 99990 100000" "target/release/stacks-inspect.opt-tuple-data-collect replay-block /home/jbencin/data/next/ range 99990 100000"
Benchmark 1: target/release/stacks-inspect.opt-tuple-data replay-block /home/jbencin/data/next/ range 99990 100000
Time (mean ± σ): 8.874 s ± 0.020 s [User: 8.372 s, System: 0.457 s]
Range (min … max): 8.836 s … 8.912 s 20 runs
Benchmark 2: target/release/stacks-inspect.opt-tuple-data-collect replay-block /home/jbencin/data/next/ range 99990 100000
Time (mean ± σ): 9.160 s ± 0.026 s [User: 8.663 s, System: 0.454 s]
Range (min … max): 9.104 s … 9.210 s 20 runs
Summary
target/release/stacks-inspect.opt-tuple-data replay-block /home/jbencin/data/next/ range 99990 100000 ran
1.03 ± 0.00 times faster than target/release/stacks-inspect.opt-tuple-data-collect replay-block /home/jbencin/data/next/ range 99990 100000
I don't know what's going on here, maybe the data is already sorted and doing the extra Vec
allocation and sort in .collect()
just wastes time
It's been pretty disappointing benchmarking iterator performance, as I much prefer coding using this style over for
loops and if
statements
Creating a ticket for further discussion of the following from @ASuciuX on Slack:
Tagging @jcnelson and @kantai who also chimed in on the above.
The following types make use of
BTreeMap
s in theclarity
crate:ContractAnalysis
ParsedContract
ContractContext
TraitContext
TupleData
TupleTypeSignature
The usage of
BTreeMap
is primarily to preserve ordering of keys. However, ordering of keys is only really required when serializing consensus-critical types to their SIP-5 consensus binary format, which in turn is hashed and stored as a part of the chainstate.Given that, it should be possible to replace most, if not all, usages of
BTreeMap
withHashSet
s, and perform a sort-on-consensus-serialization.btree
hashmap (std)
hashmap (hashbrown)
indexmap
insert
653.46 us
(✅ 1.00x)611.74 us
(✅ 1.07x faster)597.16 us
(✅ 1.09x faster)581.33 us
(✅ 1.12x faster)to sorted vec
43.13 us
(✅ 1.00x)78.96 us
(❌ 1.83x slower)71.72 us
(❌ 1.66x slower)79.61 us
(❌ 1.85x slower)random lookups
81.21 us
(✅ 1.00x)29.87 us
(🚀 2.72x faster)16.08 us
(🚀 5.05x faster)30.70 us
(🚀 2.65x faster)While
BTreeMap
excels at retrieving ordered entries, the other implementations offer a marginal improvement for inserts and marked improvements for random reads. Given the assumption that inserts and random reads largely outweigh sorted enumeration, choosing another implementation may offer measurable timing improvements.I put together a little playground showing some potential alternatives, in addition to the benchmarks.