Closed nikomatsakis closed 2 months ago
Name | Link |
---|---|
Latest commit | 02036ff44660bfad0130d53b0db74567bdb24f91 |
Latest deploy log | https://app.netlify.com/sites/salsa-rs/deploys/66c4a3bf75d33e00083d33fb |
Comparing nikomatsakis:tables
(02036ff) with master
(08820ea)
⚡ 1
improvements
✅ 7
untouched benchmarks
Benchmark | master |
nikomatsakis:tables |
Change | |
---|---|---|---|---|
⚡ | many_tracked_structs |
130.9 µs | 48.8 µs | ×2.7 |
Wow, very impressive (I haven't read through the code yet). One understand question:
This lays the foundation for serialization/deserialization as well as speculative execution by using copy-on-write pages, but it also avoids a ton of centralized hashing. Tracked functions that take a single salsa struct as argument now avoid hashtable lookups altogether.
By salsa struct. Does this apply to both inputs and tracked structs or only tracked structs?
It's interesting that some of the other benchmarks regress.
You might already be aware of it: Constant queries no-longer compile:
/// Salsa query to get the builtins scope.
///
/// Can return None if a custom typeshed is used that is missing `builtins.pyi`.
#[salsa::tracked]
pub(crate) fn builtins_scope(db: &dyn Db) -> Option<ScopeId<'_>> {
let builtins_name =
ModuleName::new_static("builtins").expect("Expected 'builtins' to be a valid module name");
let builtins_file = resolve_module(db, builtins_name)?.file();
Some(global_scope(db, builtins_file))
}
error[E0658]: cannot cast `dyn db::Db` to `dyn Database`, trait upcasting coercion is experimental
--> crates/red_knot_python_semantic/src/builtins.rs:10:1
|
10 | #[salsa::tracked]
| ^^^^^^^^^^^^^^^^^
|
= note: see issue #65991 <https://github.com/rust-lang/rust/issues/65991> for more information
= note: required when coercing `&'db (dyn db::Db + 'static)` into `&(dyn Database + 'static)`
= note: this error originates in the macro `salsa::plumbing::setup_tracked_fn` which comes from the expansion of the attribute macro `salsa::tracked` (in Nightly builds, run with -Z macro-backtrace for more info)
By salsa struct. Does this apply to both inputs and tracked structs or only tracked structs?
Salsa struct = interned | input | tracked
It's interesting that some of the other benchmarks regress.
What regresses?
You might already be aware of it: Constant queries no-longer compile:
Hmm. Seems like we are missing a test. I did change how constant queries are implemented (it's a bit less efficient now than it was, as it winds up with a silly hashmap; could be fixed, just didn't).
It's interesting that some of the other benchmarks regress.
What regresses?
Here: https://codspeed.io/salsa-rs/salsa/branches/nikomatsakis:tables. For whatever reason, the comment doesn't show that change; you have to click into the report.
amortized[Input]
(8%, from 3.9µs to 4.2µs)amortized[InternedInput]
(8%, from 3.2µs to 3.5µs)mutating[20]
(4%, from 19.1µs to 20µs)It looks like CodSpeed considers those "regressions" to be within the margin of error/noise. I don't know how sophisticated the statistics are that CodSpeed uses to make those decisions, but I'm also not sure that we should read too much into results that CodSpeed has decided are not significant.
Gotcha! The thing I'm unsure of is whether criterion considers these to be noise. Does codespeed delegate to criterion on that front?
Yeah, good q, I don't know...
Gotcha! The thing I'm unsure of is whether criterion considers these to be noise. Does codespeed delegate to criterion on that front?
There's a settings page and it defaults to a 10 margin. It also supports commenting on improvements/regressions only
My hunch is that this branch is slightly worse on some microbenchmarks (among other things, it is using array lookups instead of pure pointers), but much better in more complex scenarios by avoiding centralized hashing.
That makes sense. A single-digit regression on those benchmarks is feels like it's fine and nothing to be concerned about.
Tracked functions that take a single salsa struct as argument now avoid hashtable lookups altogether.
Is my understanding correct that this is achieved by using the salsa struct ID as ID into the query cache and is based on the assumption that a query is "dense" (likely to be called for all arguments)?
Is my understanding correct that this is achieved by using the salsa struct ID as ID into the query cache and is based on the assumption that a query is "dense" (likely to be called for all arguments)?
Not exactly. Each tracked function is assigned a MemoIngredientIndex
(starting from 0). Each salsa struct has an "id" which corresponds to a slot in one of the tables. The slots then carry an array for memos. This array is grown from 0 to the length of the max memo-ingredient-index with which it is used. The entries are arcs. So I guess you could say that we assume if you invoke any tracked functions on a given struct, you'll invoke many. I'm not sure how this will scale, we could do more sophisticated things.
Rebased at @davidbarsky's request. Once the CI stuff re-runs I'll merge this, presuming it looks good.
@davidbarsky
This is what I see on codspeed:
thanks for rebasing! those numbers look good for now; i think this branch is fine to land for now? things can be improved after.
Hmm, I think there's still the issue with const queries not compiling.
Hmm, I think there's still the issue with const queries not compiling.
@MichaReiser made an issue: https://github.com/salsa-rs/salsa/issues/565
This PR modifies the way we store salsa struct to use a central table/page system. The results of tracked functions are now stored in memos/syncs attached to a particular slot.
This lays the foundation for serialization/deserialization as well as speculative execution by using copy-on-write pages, but it also avoids a ton of centralized hashing. Tracked functions that take a single salsa struct as argument now avoid hashtable lookups altogether.
In the future I'd like to improve the way that tracked functions with >1 argument work by extending the memo table to optionally store a per-struct hashmap, but that can be a separate PR.