jaspervdj / hakyll

A static website compiler library in Haskell
jaspervdj.be/hakyll
Other
2.71k stars 409 forks source link

New versions of hakyll always rebuild everything #967

Closed orlitzky closed 1 year ago

orlitzky commented 1 year ago

This is a regression, but I'm not sure when it was introduced because our distro package wasn't updated for a while. But hakyll-4.15.1.1 built with ghc-9.0.2 always rebuilds my entire site.

First we build...

$ ./site build -v | head
Initialising...
  Creating store...
  Creating provider...
  Running rules...
Checking for out-of-date items
  [DEBUG] articles/a_non-proof_of_the_lagrange_multiplier_theorem.xhtml is out-of-date because it is new
  [DEBUG] articles/advice_from_the_trenches.xhtml is out-of-date because it is new
  [DEBUG] articles/against_ca-signed_certificates.xhtml is out-of-date because it is new
  [DEBUG] articles/avoid_the_link_target_attribute.xhtml is out-of-date because it is new
  [DEBUG] articles/avoiding_rewritebase_with_mod_rewrite.xhtml is out-of-date because it is new
   ...

Then we build:

$ ./site build -v | head
Initialising...
  Creating store...
  Creating provider...
  Running rules...
Checking for out-of-date items
  [DEBUG] articles/a_non-proof_of_the_lagrange_multiplier_theorem.xhtml is out-of-date because it is new
  [DEBUG] articles/advice_from_the_trenches.xhtml is out-of-date because it is new
  [DEBUG] articles/against_ca-signed_certificates.xhtml is out-of-date because it is new
  [DEBUG] articles/avoid_the_link_target_attribute.xhtml is out-of-date because it is new
  [DEBUG] articles/avoiding_rewritebase_with_mod_rewrite.xhtml is out-of-date because it is new
   ...

The files are indeed generated:

$ ls ../public/articles/a_non-proof_of_the_lagrange_multiplier_theorem.xhtml
-rw-r--r-- 1 mjo mjo 48K 2023-02-01 17:43 ../public/articles/a_non-proof_of_the_lagrange_multiplier_theorem.xhtml

I promise this didn't used to happen :)

Minoru commented 1 year ago

I tried to reproduce this with my site using Hakyll 4.15.1.1 build with GHC 9.0.2, and couldn't. For me, the second build doesn't rebuild anything.

A couple shots in the dark:

orlitzky commented 1 year ago
  1. ./site clean doesn't help.
  2. My site isn't public but I can reproduce the problem with your blog. I built with runghc Setup.hs configure; runghc Setup.hs build, and then...
$ dist/build/debiania/debiania build -v
Initialising...
  Creating store...
  Creating provider...
  Running rules...
Checking for out-of-date items
  [DEBUG] 404.markdown is out-of-date because it is new
  [DEBUG] about.markdown is out-of-date because it is new
  [DEBUG] css/debiania.css is out-of-date because it is new
  [DEBUG] css/debiania.css.gz is out-of-date because it is new
  [DEBUG] css/default.css is out-of-date because it is new
  ...

$ dist/build/debiania/debiania build -v
Initialising...
  Creating store...
  Creating provider...
  Running rules...
Checking for out-of-date items
  [DEBUG] 404.markdown is out-of-date because it is new
  [DEBUG] about.markdown is out-of-date because it is new
  [DEBUG] css/debiania.css is out-of-date because it is new
  [DEBUG] css/debiania.css.gz is out-of-date because it is new
  [DEBUG] css/default.css is out-of-date because it is new
  ...

So I guess it's something on my machine. I'm on Gentoo and build everything with -O2, but we don't have any weird patches to hakyll or anything like that. I'll see if I can't get the old version installed somehow.

orlitzky commented 1 year ago

I'll see if I can't get the old version installed somehow.

It's a mess. I'll waste more time trying to get the old version installed than I do rebuilding my own site :P

I looked through the history, however, and the last version available in Gentoo was v4.14.0.0 back in April of 2021. So there's a very good chance that's the version I had installed and working.

orlitzky commented 1 year ago

Spent a while adding trace calls tonight.

I don't think this has anything to do with marking items out-of-date. For example, I can turn markOod into a no-op and everything still gets rebuilt. Working backwards, I see that modified contains... everything, within the scheduleOutOfDate function. So, outOfDate is being told that literally everything was modified and needs to be rebuilt.

Working backwards...resourceModified is always returning True. Within that function, M.lookup normal (providerFiles p) is always a Just, and M.lookup normal (providerOldFiles p) is always a Nothing.

Working backwards... oldFiles is empty in newProvider, because Store.get store oldKey is returning NotFound, even on repeated runs. This happens both with and without the in-memory cache.

Is that any closer to the root cause? I don't really know what to expect, but my guess is that oldFiles should contain my old files, but it's coming back empty from the store, which is why everything always looks "new."

Minoru commented 1 year ago

Thanks! From the log it appears that Hakyll can't remember which files it has seen during the last build. That info is stored in the cache, which you cleared and rebuilt on my request -- so either the cache entries are not written at all, are written in a wrong way, or are not read properly. I think the only way to figure all this out is for you to dive into Hakyll.Core.Store and find out why it doesn't know about any of the files.

orlitzky commented 1 year ago

Ok! This was "caused" by https://github.com/jaspervdj/hakyll/commit/122dd424891f6c9be15ff5225886484386dd0956, although the true blame may lie elsewhere. The hash provided by Data.Hashable is not stable, as the docs say:

Note: the hash is not guaranteed to be stable across library versions, operating systems or architectures. For stable hashing use named hashes: SHA256, CRC32 etc.

To really drive that point home, there's a build flag for it that randomizes the hash between runs of a program, and our version of the hashable package enables that flag -- presumably to turn unpredictable bugs into predictable ones.

However, it's not clear from the documentation just how unstable we should assume the hash to be. I've asked for clarification and will report back. It's entirely possible that short-term stability is fair to assume, and that we should not enable the hash randomization by default. (While long-term stability would be nice to have, it's not the end of the world if you have to endure one pointless rebuild every few months.)

orlitzky commented 1 year ago

The verdict is: distro package is to blame. The hashable maintainer says the seed randomization is only for testing. So I'm off to fix our hashable package :)