tweag / nickel

Better configuration for less
https://nickel-lang.org/
MIT License
2.43k stars 93 forks source link

Use a persistent vector instead of an `Rc<[..]>` #2057

Closed jneem closed 1 month ago

jneem commented 1 month ago

This is an experiment regarding array performance. Our current array representation (as essentially a Rc<[RichTerm]>) is problematic because it makes many common operations unnecessarily quadratic. This PR replaces it with a rpds::Vector<RichTerm> in reverse order, and the initial benchmarks look promising.

In more detail, the current arrays have a few performance characteristics that we might like to keep:

The Vector implementation in the rpds crate is a "persistent vector" aka "bitmapped vector trie", which offers persistence/sharing, fast random access, and fast appends. We can do the same slicing trick that we're current using for Rc<[RichTerm]> to also add fast slicing. Thanks to fast appends, we can do concatenation xs @ ys in O(ys.len()) time (provided that there are no contracts that need to be applied to xs; I'm also ignoring logarithmic terms). This is backwards from the more common concatenation pattern in functional languages, so we store arrays backwards in order to get time O(xs.len()) instead. (We could achieve the minimum of the two by storing an array as two Vectors, a backwards one followed by a forwards one.)

There are a few ways in which rpds::Vector isn't a perfect fit:

Despite these, this PR gives a 35% improvement in random normal, and a few other improvements between 10 and 20%. I'd like to also try benchmarking im and/or imbl.

github-actions[bot] commented 1 month ago

🐰 Bencher Report

Branch2057/merge
Testbedubuntu-latest

⚠️ WARNING: The following Measure does not have a Threshold. Without a Threshold, no Alerts will ever be generated!

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds CLI flag.

Click to view all benchmark results
BenchmarkLatencynanoseconds (ns)
fibonacci 10📈 view plot
⚠️ NO THRESHOLD
492,610.00
foldl arrays 50📈 view plot
⚠️ NO THRESHOLD
1,742,300.00
foldl arrays 500📈 view plot
⚠️ NO THRESHOLD
6,623,400.00
foldr strings 50📈 view plot
⚠️ NO THRESHOLD
7,128,100.00
foldr strings 500📈 view plot
⚠️ NO THRESHOLD
61,214,000.00
generate normal 250📈 view plot
⚠️ NO THRESHOLD
45,508,000.00
generate normal 50📈 view plot
⚠️ NO THRESHOLD
2,025,400.00
generate normal unchecked 1000📈 view plot
⚠️ NO THRESHOLD
3,432,500.00
generate normal unchecked 200📈 view plot
⚠️ NO THRESHOLD
759,960.00
pidigits 100📈 view plot
⚠️ NO THRESHOLD
3,170,700.00
pipe normal 20📈 view plot
⚠️ NO THRESHOLD
1,514,300.00
pipe normal 200📈 view plot
⚠️ NO THRESHOLD
9,980,000.00
product 30📈 view plot
⚠️ NO THRESHOLD
834,630.00
scalar 10📈 view plot
⚠️ NO THRESHOLD
1,545,100.00
sum 30📈 view plot
⚠️ NO THRESHOLD
826,770.00
🐰 View full continuous benchmarking report in Bencher
yannham commented 1 month ago

I'll see what it gives on the private benchmark

jneem commented 1 month ago

I tried out imbl, but the performance is worse than rpds.

jneem commented 1 month ago

The current version uses a custom re-implementation of persistent vectors, and it seems to be a performance win across the board.

jneem commented 1 month ago

Github CI agrees with my local benchmarking: this gives modest gains in general, and big gains whenever quadratic array behavior is the bottleneck.

I don't see a nice UI for comparing results to master, but here is the report for this PR and here is the report for master.

yannham commented 1 month ago

@jneem Have you tried on the private bench?

jneem commented 1 month ago

Yes, I forgot to mention that. Performance is basically identical on all three sizes.

yannham commented 1 month ago

By the way, I had another side question: could this representation take advantage of the in-place modification when a value is 1-RC ? I think the answer is yes, from what I remember reading Clojure's persistent array blog post, but just to make sure.

jneem commented 1 month ago

could this representation take advantage of the in-place modification when a value is 1-RC ?

Yep, that should be the case already. We use Rc::make_mut for all the modifications, so it should have the in-place behavior whenever possible (including for subtrees -- if the root tree is shared but some subtree is uniquely owned, the root block will be copied but the uniquely owned part will be mutated). I'll add it to the module docs.

yannham commented 1 month ago

The last commit fails some test on Windows only it seems (but I think there is property-based testing, so the Windows part might be a red herring and it's just that some random path leading to the panic):

thread 'array_mutations' panicked at vector\tests\arbtest.rs:85:27:
attempt to calculate the remainder with a divisor of zero