jaspervdj / hakyll

A static website compiler library in Haskell
jaspervdj.be/hakyll
Other
2.71k stars 409 forks source link

Observable change of execution order was introduced in #946 if two matches write same file #1000

Open BinderDavid opened 1 year ago

BinderDavid commented 1 year ago

It is a bit difficult to summarize the problem in the description, but I have provided a reproducer below. The problem is that hakyll does not produce the correct build results upon the first invocation, but if we force the rebuild of the problematic files (by using site watch and changing something in the file), then hakyll produces the correct result.

The bug seems to have been introduced between 4.15.1.1 which does not exhibit the bug, and 4.16.1.0 which exhibits the bug. I will try to do a git bisect to identify the commit which is responsible.

Related Issue: https://github.com/haskellfoundation/error-message-index/issues/459

How to reproduce

When comparing two versions it is important to clean the _cache directory between runs. It is otherwise possible that the faulty version reuses the cache from a run of the correct version, and you can no longer observe the error.

BinderDavid commented 1 year ago

Result of git bisect:

david@anaxes ~/GitRepos/hakyll (master)$ git bisect start
david@anaxes ~/GitRepos/hakyll (master)$ git bisect bad
david@anaxes ~/GitRepos/hakyll (master)$ git bisect good v4.15.1.1
Bisecting: 30 revisions left to test after this (roughly 5 steps)
[5f00056269a636b26b2199d319f913a10169b07c] Examples: add blog.galowicz.de (#963)
david@anaxes ~/GitRepos/hakyll ((no branch, bisect started on master))$ git bisect good
Bisecting: 15 revisions left to test after this (roughly 4 steps)
[13b4697b7aa4e7715a88d7912799703de094ac5b] fix: Twitter Card use `name` instead `property` (#971)
david@anaxes ~/GitRepos/hakyll ((no branch, bisect started on master))$ git bisect good
Bisecting: 7 revisions left to test after this (roughly 3 steps)
[388565c6aa9a5c5241e94304f559290a784a0b65] Add example website and tutorials (#990)
david@anaxes ~/GitRepos/hakyll ((no branch, bisect started on master))$ git bisect good
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[9367a88b5be69e16048240893890282b8e70b759] CI: re-enable GHC 9.6.1 on Windows (#997)
david@anaxes ~/GitRepos/hakyll ((no branch, bisect started on master))$ git bisect good
Bisecting: 1 revision left to test after this (roughly 1 step)
[ec3365aac9ab6d6c6d5e1bb8f90ae6e4bb84ee4a] Allow aeson 2.2 (#999)
david@anaxes ~/GitRepos/hakyll ((no branch, bisect started on master))$ git bisect good
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[9696a859da5d5227edb79f08a85739a6f2d4d92a] Improve async runtime scaling (#946)
david@anaxes ~/GitRepos/hakyll ((no branch, bisect started on master))$ git bisect bad
9696a859da5d5227edb79f08a85739a6f2d4d92a is the first bad commit
commit 9696a859da5d5227edb79f08a85739a6f2d4d92a
Author: Jasper Van der Jeugt <m@jaspervdj.be>
Date:   Wed Aug 23 19:24:20 2023 +0200

    Improve async runtime scaling (#946)

 hakyll.cabal                       |   1 -
 lib/Hakyll/Core/Logger.hs          |  78 +++---
 lib/Hakyll/Core/Runtime.hs         | 512 ++++++++++++++++++++++++-------------
 stack.yaml.lock                    |   2 +-
 tests/Hakyll/Core/Runtime/Tests.hs |  38 ++-
 tests/TestSuite/Util.hs            |   2 +-
 6 files changed, 398 insertions(+), 235 deletions(-)
david@anaxes ~/GitRepos/hakyll ((no branch, bisect started on master))$ 

The bug was therefore introduced in #946

BinderDavid commented 1 year ago

The bug is independent on whether the threaded runtime is used or not. If I remove the ghc-options: -threaded -rtsopts -with-rtsopts=-N line in message-index.cabal, then the problem persists.

jaspervdj commented 1 year ago

I was able to reproduce this easily, thanks for the excellent instructions.

The issue seems in this snippet in message-index/site.hs:

  match "messages/*/*/index.md" $
    version "nav" $ do
      route $ setExtension "html"
      compile getResourceBody

  match "messages/*/*/index.md" $ do
    route $ setExtension "html"
    compile $ do
      files <- getExampleFiles
      thisMessage <-

For both versions, we have a route set that writes the final item to messages/*/*/index.html. So depending on which one triggers last, we get either a raw getResourceBody or an HTML file.

Deleting it from the first one fixed the issue for me -- we don't actually want the nav version written anywhere since that's just for navigation purposes. I think there are possibly other places in this file where we should fix this.

--- a/message-index/site.hs
+++ b/message-index/site.hs
@@ -57,7 +57,6 @@ main = hakyll $ do

   match "messages/*/*/index.md" $
     version "nav" $ do
-      route $ setExtension "html"
       compile getResourceBody

   match "messages/*/*/index.md" $ do

I think it's confusing behaviour and I would prefer Hakyll to throw an error if you write to the same file twice; so I will leave this open to track that.

BinderDavid commented 1 year ago

Thanks, I have https://github.com/haskellfoundation/error-message-index/pull/461 up. So the error is on our side, and we just could not observe the erroneous result before since we relied on a specific execution order that we shouldn't have depended on.

Edit: And I changed the title to reflect the fact that it is not a bug in hakyll :)

frasertweedale commented 1 year ago

In my sites I create a "recent" version of the files, which I iterate to produce "recent posts" lists.

match "posts/*" $ version "recent" $ do
  route $ setExtension "html"          
  compile ...

I use the $url$ field for hyperlinks in the "recent posts" template. urlField reads from the route. Removing the route declaration makes this empty, and the links are broken.

Is there a way to set the nominal route without writing the file and triggering this new warning? Suggestions or alternative approaches welcome!

jaspervdj commented 1 year ago

@frasertweedale It depends on what you're trying to do.

If it's about selecting a subset of the posts, you shouldn't need a different version for this. You can do something like recentFirst and then take the N first items from the list. Or you can filter on some metadata field, e.g.: https://github.com/jaspervdj/jaspervdj/blob/4d519d5d21c38c8896d3ddf6067aad006d82a0ec/src/Main.hs#L316

If the content of the file needs to be different (e.g because it's included in a list), you can do this by saving a snapshot at that time: https://github.com/jaspervdj/jaspervdj/blob/4d519d5d21c38c8896d3ddf6067aad006d82a0ec/src/Main.hs#L66

Otherwise... maybe I would try adding a new Context field to the "recent" version, which loads the "normal" version and retrieves the route for that?

Sorry to have broken your website!

frasertweedale commented 1 year ago

@jaspervdj thanks for the pointers.

I resolved it by creating a new Context for retrieving the route of the associated "no version" identifier:

urlFieldNoVersion :: String -> Context a
urlFieldNoVersion key = field key $ \i -> do
  let ident = setVersion Nothing (itemIdentifier i)
      empty' = fail $ "No route url found for item " <> show ident
  fmap (maybe empty' toUrl) $ getRoute ident

I add it to the context:

  <> urlFieldNoVersion "url0"

and updated the template:

     $for(posts)$
         <li>
-            <a href="$url$">$fancyTitle$</a>
+            <a href="$url0$">$fancyTitle$</a>
         </li>
     $endfor$

edit: ah, but the atom feed links are still broken, because the shipped templates use $url$

edit 2: this too I have now resolved, by re-snapshotting the content from the "recent" item in the compiler for the no-version, then updating the atom Rules to load those snapshots, which have routes and therefore the $url$ in the template works again.

edit 3: here's the full diff, in case it is useful to someone: https://github.com/frasertweedale/blog-fp/commit/8eb6866010b4065f2e883df36cbfb990d7b99c5a