Make files copy-on-write

tweag / nickel

Better configuration for less

https://nickel-lang.org/

MIT License

2.43k stars 93 forks source link

Make files copy-on-write #2084

Closed jneem closed 3 weeks ago

jneem commented 3 weeks ago

This introduces our own Files database, and uses it instead of the one in codespan. The main point of this change is that our Files is cheaply clonable and copy-on-write (using Vector under the hood). Also, the loading of the stdlib text has been moved from Cache to Files. Together, this means that errors needing files for reporting can take a cloned Files instead of a Program. (I haven't actually made this change to the errors yet.)

There are a lot of changes caused by the fact that we're moving from codespan::{Files, FileId} to nickel_lang_core::files::{Files, FileId} everywhere, but most of the interesting changes are to core/src/cache.rs.

github-actions[bot] commented 3 weeks ago

Bencher Report

Branch	2084/merge
Testbed	ubuntu-latest

⚠️ WARNING: The following Measure does not have a Threshold. Without a Threshold, no Alerts will ever be generated!
Latency
Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds CLI flag.

Click to view all benchmark results

Benchmark	Latency	nanoseconds (ns)
fibonacci 10	📈 view plot ⚠️ NO THRESHOLD	486,010.00
foldl arrays 50	📈 view plot ⚠️ NO THRESHOLD	1,669,500.00
foldl arrays 500	📈 view plot ⚠️ NO THRESHOLD	6,699,600.00
foldr strings 50	📈 view plot ⚠️ NO THRESHOLD	7,245,300.00
foldr strings 500	📈 view plot ⚠️ NO THRESHOLD	63,115,000.00
generate normal 250	📈 view plot ⚠️ NO THRESHOLD	43,528,000.00
generate normal 50	📈 view plot ⚠️ NO THRESHOLD	2,031,400.00
generate normal unchecked 1000	📈 view plot ⚠️ NO THRESHOLD	3,364,800.00
generate normal unchecked 200	📈 view plot ⚠️ NO THRESHOLD	771,960.00
pidigits 100	📈 view plot ⚠️ NO THRESHOLD	3,166,800.00
pipe normal 20	📈 view plot ⚠️ NO THRESHOLD	1,510,400.00
pipe normal 200	📈 view plot ⚠️ NO THRESHOLD	10,138,000.00
product 30	📈 view plot ⚠️ NO THRESHOLD	815,880.00
scalar 10	📈 view plot ⚠️ NO THRESHOLD	1,497,000.00
sum 30	📈 view plot ⚠️ NO THRESHOLD	812,820.00

🐰 View full continuous benchmarking report in Bencher

yannham commented 3 weeks ago

I haven't took a deep look yet, but the copy-on-write part made it scary at first: it would mean that we might have to copy the whole file database upon addition of a temporary snippet, which sounds costly. But in fact it's not copy-on-write, it's using persistent vectors, which is different. Is that correct?

jneem commented 3 weeks ago

It is behind a persistent vector, yes, but the data in a vector can still be cloned when the vector is modified so I still think of it as "copy-on-write." The file data is in an Rc<str>, though, so these clones are cheap.

yannham commented 3 weeks ago

I see. To me copy-on-write is the good old one, implemented with Cow, where you would copy the whole structure upon write. I think persistent vectors are sharing much more data, but I can see where you come from. Anyway, if the files are behind Rc<str>, we're fine either way.