Closed ChrisPenner closed 1 month ago
Okay, so, maybe it's as simple as this...
Separate the Lam
constructor of GComb
into its own type. This is the type that should occur in PAp
.
Make GComb
a sum referring to this new type. The new type should be unpacked into GComb
, and pattern aliases could make using it look essentially like the current GComb
.
Occurrences of the new, separate type don't need to be lazy, because they aren't from knot tying, they're the concrete entry info for an actual combinator.
@ChrisPenner I'm curious about when this kicks in and what "on startup" means. Could you help me understand? Some of my questions:
ucm run
. Does it apply to run.compiled
and run.native
as well?run
, run.compiled
, and run.native
, does it only evaluate/cache the dependencies of the function being run?Here's one case that I'm wondering about:
Historically it hasn't been too bad to have a (pure) test in a project that checks 100k samples and takes 10 seconds to evaluate, because the result gets cached. But are we going to pay that 10 second penalty just to cache [Ok "passed"]
in more places now (like when I run my main
method or when I switch into that project)?
I should have mentioned in my last comment that I'm excited about this change @ChrisPenner! I've seen repeated evaluation bite newcomers, and I've been curious about to what extent it might be hitting us with certain Pattern
values, decoders, etc. Thank you!
@ceedubs
For run, run.compiled, and run.native, does it only evaluate/cache the dependencies of the function being run?
It will run and cache each pure top-level definition as part of adding it to the runtime's code-cache, we only load the dependencies of whatever we're trying to run, so it'll evaluate dependencies of a main
the first time it's added to the runtime's code cache (e.g. once per ucm session, or once at the start of a run.compiled
when we load the code from a uc
file.)
You've specified that this applies to ucm run. Does it apply to run.compiled and run.native as well?
See above, for ucm run
and run.compiled
the pre-evaluation triggers for each thing the first time it's added to the code cache.
run.native
doesn't benefit from this change at all unfortunately.
It sounds like this also applies to interactive ucm sessions. Does it cache the result of every pure top-level term in the codebase at startup? Or when you switch to a project does it compute only the ones with names in the current branch of the project? Or something else entirely?
It only pre-evaluates the dependencies of things you try to run
, and will cache evaluation results for the lifespan of that runtime, which corresponds to the lifetime of the UCM process for the most part, e.g. if you run main
it'll evaluate all cacheable definitions which main
depends on (transitively). If you run main
again, all the code is already loaded, so it won't do it again. If you change main
it'll load any new hashes and evaluate anything cacheable in ONLY the new hashes, old ones are still cached.
Is this relevant for both the interpreter and the JIT runtime? Only the interpreter for now
Historically it hasn't been too bad to have a (pure) test in a project that checks 100k samples and takes 10 seconds to evaluate, because the result gets cached. But are we going to pay that 10 second penalty just to cache [Ok "passed"] in more places now (like when I run my main method or when I switch into that project)?
This is only an issue if you reference the test in your main
.
We may want to look into optimizations for docs and termLinks where we don't necessarily depend on the actual result of a referenced definition 🤔
One downside of this change is if you have code like this:
main = do
if something
then exit 0
else Map.lookup k hugeExpensiveMap
Previously hugeExpensiveMap
would only be evaluated when it was actually hit by the interpreter, so this program would exit immediately if something
is true.
Now we'll pre-evaluate hugeExpensiveMap
once before even starting to run the program, which could lead to an increased startup time in some cases.
So this change will be good for longer-term processes like cloud and webservers, but potentially worse for short-lived processes like cli apps.
If the problem proves noticeable enough we probably can fix this delay by continuing on the Closure serialization work I mention in the overview, which I think we still want to do, the work was just starting to hit diminishing returns for how tricky it was when we have bigger possible gains with other work 😄 .
I noticed this broke the serialized form round trip golden tests. Which might be fine, since I'd expect this could change the serialized form for some definitions, but it's worth checking.
@SystemFw do we have Unison Cloud client tests to make sure that no hashes change for schema types and functions? The way these could work is to just crypto.hash
various values and save as a hex string literal, and then the test does a comparison and fails if the hashes differ.
We now have a test in the Cloud client to test for the relevant hash changes. It passes on this branch.
serializing the pre-evaluated pure results is possible with more work
fwiw, contrary to the majority opinion in the meeting, I don't think this would be a good idea. afaik other languages don't do it, apart maybe from some trivial constant folding — if someone really wants to precompute a pure result, they can always use add.run
@SystemFw cool, I'm good with merging this then, whenever @dolio and/or @ChrisPenner are ready.
Overview
Before this change all pure top-level definitions were evaluated at every call site, which is obviously unnecessary.
Now we can detect these using type info from the codebase, and inline the evaluated result into usage sites.
We also discussed whether to bake the evaluated results into the compiled artifact. The consensus was yes, to serialize the results; I'd still like to do that, but after a day of fiddling it proved tougher than I expected. Issues with that were:
Foreign
s, which are arbitrary Haskell values, most of these won't appear in a pure value, but some thing like stdin/stdout Handles do, patterns as well, and things like text/bytes literals. These are all serializable in their own way, but it takes some care to ensure we do it correctly.Seg 'BX
, but this is fixed in the Stack implementation to include inlined Sections, which we can't serialize. We can most likely either parameterize this type, or replace usage sites with a more generic version, but it affects a big chunk of things and is going to be a bit of a pain to figure out and propagate.So serializing the pre-evalutated pure results is possible with more work, but it'd be nice to release the non-serializing version for now. The currently implemented version pre-evaluates cacheable values on executable startup.
When running something from a cold runtime (e.g. via
ucm run
or a freshucm
session) there'll be a short delay while ucm evaluates everything, but it'll be snappy after that.Implementation notes
RComb
indirection by including the CombIx directly in the Section/Instr and avoid an additional redirection. This will also help with serializing closures, but as mentioned above that part isn't done yet.getTypeOfTerm
to Runtime CodeLookup.It's a bit annoying, because in order to evaluate the pure defns I need the rest of all the combs to be in the right format for the runtime, so here's how it goes:
Benchmarks
Made a test which references a 1k element map definition:
Loose ends
See overview about serializing closures.