Closed snowleopard closed 5 years ago
@snowleopard I think both of us were hoping there was caching going on here. Maybe it happens within a build but not accross builds? I'm really not sure -- the oracle system comes with caching no? So if we ask twice for the Cabal data of a package, hadrian will only hit the .cabal
file once, the first time, right?
(I'm talking about these oracles: https://github.com/snowleopard/hadrian/blob/master/src/Hadrian/Oracles/TextFile.hs#L101 -- "instantiated" below in textFileOracle
.)
@alpmestan I think the newCache
combinator only caches computations within a single build.
I remember that Shake now provides some new/nicer infrastructure for storing arbitrary information in the Shake database. @ndmitchell Could you give a pointer?
@snowleopard Ooooooh, so that's the catch. If Neil points us to something that persists accross builds, then all we have to do is change newCache
to that to perform the parsing just once, when we first run hadrian. (or once per build-root maybe? where does shake's db live?)
(A side note after thinking about such across-build caching: I think the ability to support 'virtual' cached build artefacts is what distinguishes traditional build systems from self-adjusting computations.)
Why not persist it in a real file? Storing it in the Shake DB means Shake has to cache it in memory (not necessarily desirable) and means that it's harder to debug. Shake could easily implement something that cached persistently, but it hasn't yet.
@ndmitchell It's not too difficult to store it by ourselves, but it seems that coming up with a filename, parsing and serialisation is unnecessary complexity that can be delegated to a build system.
Coming up with a unique name (whether that is a type name or a filename) is still required. Serialising/parsing is still required. You really don't eliminate much complexity...
I thought we could have a combination of newCache
and askOracle
:
newPersistentCache :: (PersistentCacheResult q ~ a, ShakeValue q, ShakeValue a)
=> (q -> Action a) -> Rules (q -> Action a)
Now we could instantiate this with q = Package
and a = PackageMetadata
, and cache the existing function parseCabal :: Package -> Action PackageMetadata
in Shake's database.
With this approach we don't need to invent a name for an intermediate file, and we don't need to parse or serialise PackageMetadata
, because this can be done in a generic way by Shake.
P.S.: It appears that we do need to keep all PackageMetadata
in memory anyway, so there is no overhead in asking Shake to keep it in memory for every build. Of course, the Action
should be rebuilt if its dependencies have changed.
Package
is the name for the intermediate keys that you just invented... Certainly no reason this couldn't be implemented in Shake, and moreover you can implement it yourself on the outside of Shake, but it isn't baked in yet.
moreover you can implement it yourself on the outside of Shake
Yep, we should give this a try. Is this the right starting point?
https://hackage.haskell.org/package/shake-0.16.3/docs/Development-Shake-Rule.html
Yep, addBuiltinRule
is the place to start from. Feedback on the docs etc most welcome!
@snowleopard I decided to do this myself in Shake, so just wait for the next release...
@ndmitchell Cool, thank you!
Just pushed, see addOracleCache
. I guess you need a release to actually use it?
That was fast! :) Yes, I think a release would be more convenient. But we could start playing with the new feature locally pointing Stack to Shake's HEAD, so it's not too urgent.
0.16.4 released with addOracleCache
.
Awesome, that makes this issue easy :)
Fixed in #692.
@ndmitchell Switching to addOracleCache
was very easy -- thanks for adding this feature!
However, I suspect that I can make some further simplifications. Here is an example:
packageData <- newCache $ \package -> do
let file = pkgCabalFile package
need [file]
putLoud $ "| PackageData oracle: parsing " ++ quote file ++ "..."
parsePackageData package
void $ addOracleCache $ \(PackageDataKey package) -> packageData package
We used to have addOracle
here, and now we switched to addOracleCache
. But do we still need newCache
? Presumably addOracleCache
already does the caching, so I can rewrite the above simply to:
void $ addOracleCache $ \(PackageDataKey package) -> do
let file = pkgCabalFile package
need [file]
putLoud $ "| PackageData oracle: parsing " ++ quote file ++ "..."
parsePackageData package
Should I?
Yes, simplify! I suspect the newCache there never helped anyway - addOracleCache is caching between runs (persistent), but normal addOracle caches within a single run, as does addOracle, so it probably didn't help.
@ndmitchell Thanks! Done in #694.
What about cases like this?
kv <- newCache $ \file -> do
need [file]
putLoud $ "| KeyValue oracle: reading " ++ quote file ++ "..."
liftIO $ readConfigFile file
void $ addOracleCache $ \(KeyValue (file, key)) -> Map.lookup key <$> kv file
Here newCache
is presumably useful, since we don't want to parse the key-value file multiple times when answering queries for individual keys. Is there a way we could simplify this?
That's the right way of writing it in Shake. You can of course abstract it away if it's a common pattern with function abstraction, like normal.
When I run Hadrian on my machine, the first minute or so is spent on parsing Cabal files:
This used to be cached in
package-data.mk
files substantially speeding up subsequent builds, but now this seems to be a constant cost for any build, unless I'm mistaken.(The build soon fails on my machine due to an unrelated issue, so I can't try a zero build, but I assume there is no caching at the moment.)
@angerman @alpmestan Do you have any plans on how to cache this?