Closed snowleopard closed 6 years ago
I'm a bit fuzzy on what the oracles actually are, but I initially would have hoped that they would cache things better than that. Maybe it's possible and we're just misusing oracles or missing another layer to prevent those calls from happening, caching things earlier in the "pipeline"?
To answer your last question: in theory, I'm pretty sure we really just need to parse them once and from that point we just share the in-memory (or on-disk?) representation. They don't change during the build, and some of them don't even exist before ./configure.
I'm a bit fuzzy on what the oracles actually are, but I initially would have hoped that they would cache things better than that.
@alpmestan It's not the fault of oracles: they are given different queries -- different contexts (with varying stage and way for the very same package). They do cache things when called with the same query.
To answer your last question: in theory, I'm pretty sure we really just need to parse them once and from that point we just share the in-memory (or on-disk?) representation.
Yes, I agree. I'll see if I can optimise this by switching to having a Package
as a key to these oracles instead of a Context
. This should be possible.
Yes, that does sound a whooole lot better. Let me know how much of an improvement this is :-)
Have you made any progress on this? A patch that I can pick up perhaps? I'm really beginning to be bothehred by this, as hadrian takes 30+ seconds to boot and 5 to run the test I'm looking into at a given point.
@alpmestan I've started working on this, but hit a few non-obvious questions -- sometimes we do seem to need the information about the current stage (e.g. to pass the right compiler to Cabal configure
) and way (to figure out the right configuration flags). My code is in a big mess at the moment, but there is nothing too valuable that can't be thrown away. I won't have much time to work on this in the next couple of days, so feel free to just start from scratch if you like.
Well, we also have to keep in mind that we have two ways to represent parsed .cabal files. One is just a straight Haskell representation of the content, and the other "instantiates" all the variable things (conditionals about OS, ghc, etc, but also cabal flags) in a given context (OS, compiler, choice of flags, ...).
Yes! I'm also tempted to get rid of this duplication. My current plan is:
GeneralPackageDescription
. From this we can extract things like package synopsis.GeneralPackageDescription
once per context, producing cached ContextData
/CabalData
.Does this sound reasonable?
Yes. And we might want to have --trace
emit messages about oracles/caching/parsing that are precise and let us see exactly when we're doing work that we're not supposed to do anymore. At the moment we have some hints but not all the information we would need, I think. It might even be worth taking care of this first, so that we can easily see the improvement there and confirm that we get the desired behaviour?
I actually think I have a pretty good idea what's going on (the amount of diagnostic information is sufficient for me), but sure, it would be nice to improve on this aspect!
May contextDependencies
be to blame here? It even has a -- TODO: Cache the computation.
comment.
@alpmestan I believe the comment refers to this issue: #550.
Apologies for taking so long with this, I've been swamped by other commitments, but I hope to send a PR for your review today/tomorrow!
OK. Because while investigating an issue for an upcoming PR (that fixes dynamic
-enabled ways for good on my linux system) I realized that this is called a looooooot of times, and we seem to repeatedly be looking things up. Anyway, I'm looking forward to your patch!
Fixed in #692.
When I do zero build with Hadrian in verbose mode, I see the following:
Many Cabal files are parsed multiple times:
CabalData
andPackageData
-- do we need both?For example,
rts.cabal
ends up being parsed 13 times! No wonder zero build now takes around 30s.Can we parse a Cabal file only once?