inukshuk / jekyll-scholar

jekyll extensions for the blogging scholar
MIT License
1.12k stars 102 forks source link

configuration yaml changes during build process #262

Closed cardi closed 2 years ago

cardi commented 5 years ago

I'm working on a prototype to fix https://github.com/inukshuk/jekyll-scholar/issues/256 using the Cache API that will be in Jekyll 4 (which is currently alpha).

(There are some nice optimizations that will be enabled by default in Jekyll 4, like rendering Markdown once and reusing it from the cache.)

problem

jekyll-scholar always makes changes to the configuration yaml at build time, which results in cache deletion each time the site is built.

details

Jekyll will delete all the caches if the configuration file (default to _config.yml) is different from the configuration that is stored in the cache.

It looks like jekyll-scholar is modifying the configuration directly at runtime, perhaps to store some global variables?

The following is the output of diff -y config-before config-after. On the left is the internal configuration seen at the start of the build process, and on the right is the modified configuration, presumably after some hooks that jekyll-scholar or its dependencies have triggered:

{"source"=>"/local/jekyll-dev",                 {"source"=>"/local/jekyll-dev",
 "destination"=>"/local/jekyll-dev/_site",           "destination"=>"/local/jekyll-dev/_site",
 "collections_dir"=>"",                      "collections_dir"=>"",
 "cache_dir"=>".jekyll-cache",                   "cache_dir"=>".jekyll-cache",
 "plugins_dir"=>"_plugins",                  "plugins_dir"=>"_plugins",
 "layouts_dir"=>"_layouts",                  "layouts_dir"=>"_layouts",
 "data_dir"=>"_data",                        "data_dir"=>"_data",
 "includes_dir"=>"_includes",                    "includes_dir"=>"_includes",
 "collections"=>                         "collections"=>
  {"posts"=>                              {"posts"=>
    {"output"=>true,                            {"output"=>true,
     "permalink"=>"/:categories/:year/:month/:day/:title:outp        "permalink"=>"/:categories/:year/:month/:day/:title:outp
 "safe"=>false,                          "safe"=>false,
 "include"=>[".htaccess"],                   "include"=>[".htaccess"],
 "exclude"=>                             "exclude"=>
  [".sass-cache",                         [".sass-cache",
   ".jekyll-cache",                        ".jekyll-cache",
   "gemfiles",                             "gemfiles",
   "Gemfile",                              "Gemfile",
   "Gemfile.lock",                         "Gemfile.lock",
   "node_modules",                         "node_modules",
   "vendor/bundle/",                           "vendor/bundle/",
   "vendor/cache/",                        "vendor/cache/",
   "vendor/gems/",                         "vendor/gems/",
   "vendor/ruby/"],                        "vendor/ruby/"],
 "keep_files"=>[],                       "keep_files"=>[],
 "encoding"=>"utf-8",                        "encoding"=>"utf-8",
 "markdown_ext"=>"markdown,mkdown,mkdn,mkd,md",          "markdown_ext"=>"markdown,mkdown,mkdn,mkd,md",
 "strict_front_matter"=>false,                   "strict_front_matter"=>false,
 "show_drafts"=>nil,                         "show_drafts"=>nil,
 "limit_posts"=>0,                       "limit_posts"=>0,
 "future"=>false,                        "future"=>false,
 "unpublished"=>false,                       "unpublished"=>false,
 "whitelist"=>[],                        "whitelist"=>[],
 "plugins"=>["jekyll/scholar"],                  "plugins"=>["jekyll/scholar"],
 "markdown"=>"kramdown",                     "markdown"=>"kramdown",
 "highlighter"=>"rouge",                     "highlighter"=>"rouge",
 "lsi"=>false,                           "lsi"=>false,
 "excerpt_separator"=>"\n" + "\n",               "excerpt_separator"=>"\n" + "\n",
 "incremental"=>false,                       "incremental"=>false,
 "detach"=>false,                        "detach"=>false,
 "port"=>"4000",                         "port"=>"4000",
 "host"=>"127.0.0.1",                        "host"=>"127.0.0.1",
 "baseurl"=>nil,                         "baseurl"=>nil,
 "show_dir_listing"=>false,                  "show_dir_listing"=>false,
 "permalink"=>"date",                        "permalink"=>"date",
 "paginate_path"=>"/page:num",                   "paginate_path"=>"/page:num",
 "timezone"=>nil,                        "timezone"=>nil,
 "quiet"=>false,                         "quiet"=>false,
 "verbose"=>false,                       "verbose"=>false,
 "defaults"=>[],                         "defaults"=>[],
 "liquid"=>                          "liquid"=>
  {"error_mode"=>"warn", "strict_filters"=>false, "strict_var     {"error_mode"=>"warn", "strict_filters"=>false, "strict_var
 "kramdown"=>                            "kramdown"=>
  {"auto_ids"=>true,                          {"auto_ids"=>true,
   "toc_levels"=>"1..6",                       "toc_levels"=>"1..6",
   "entity_output"=>"as_char",                     "entity_output"=>"as_char",
   "smart_quotes"=>"lsquo,rsquo,ldquo,rdquo",              "smart_quotes"=>"lsquo,rsquo,ldquo,rdquo",
   "input"=>"GFM",                         "input"=>"GFM",
   "hard_wrap"=>false,                         "hard_wrap"=>false,
   "guess_lang"=>true,                         "guess_lang"=>true,
   "footnote_nr"=>1,                           "footnote_nr"=>1,
   "show_warnings"=>false},                    "show_warnings"=>false},
 "name"=>"Test",                         "name"=>"Test",
 "url"=>".",                             "url"=>".",
 "description"=>"Test",                      "description"=>"Test",
 "email"=>"test@example.com",                    "email"=>"test@example.com",
 "scholar"=>                             "scholar"=>
  {"style"=>"_bib/acm-sig-proceedings-long-author-list.csl",      {"style"=>"_bib/acm-sig-proceedings-long-author-list.csl",
   "locale"=>"en",                         "locale"=>"en",
   "sort_by"=>"sortdate",                      "sort_by"=>"sortdate",
   "order"=>"descending",                      "order"=>"descending",
                                  >    "group_by"=>"none",
                                  >    "group_order"=>"ascending",
                                  >    "bibliography_group_tag"=>"h2,h3,h4,h5",
                                  >    "bibliography_list_tag"=>"ul",
                                  >    "bibliography_item_tag"=>"li",
                                  >    "bibliography_list_attributes"=>{},
                                  >    "bibliography_item_attributes"=>{},
   "source"=>"_bib",                           "source"=>"_bib",
   "bibliography"=>"references.bib.orig",              "bibliography"=>"references.bib.orig",
   "bibliography_template"=>"bib_reference",              |    "repository"=>nil,
   "bibliography_list_tag"=>"ul",                 |    "repository_file_delimiter"=>".",
                                  >    "bibtex_options"=>{:strip=>false, :parse_months=>true},
                                  >    "bibtex_filters"=>[:smallcaps, :superscript, :italics, :la
                                  >    "bibtex_skip_fields"=>[:abstract, :month_numeric],
                                  >    "bibtex_quotes"=>["{", "}"],
   "replace_strings"=>true,                    "replace_strings"=>true,
   "join_strings"=>true,                       "join_strings"=>true,
   "details_dir"=>"bib",                       "details_dir"=>"bib",
   "details_layout"=>"bib.html",                   "details_layout"=>"bib.html",
   "details_link"=>"Details",                      "details_link"=>"Details",
   "query"=>"@*[year >= 2019]"},                  |    "use_raw_bibtex_entry"=>true,
                                  >    "bibliography_class"=>"bibliography",
                                  >    "bibliography_template"=>"bib_reference",
                                  >    "reference_tagname"=>"span",
                                  >    "missing_reference"=>"(missing reference)",
                                  >    "details_link_class"=>"details",
                                  >    "query"=>"@*[year >= 2019]",
                                  >    "cite_class"=>"citation",
                                  >    "type_names"=>
                                  >     {"article"=>"Journal Articles",
                                  >      "book"=>"Books",
                                  >      "incollection"=>"Book Chapters",
                                  >      "inproceedings"=>"Conference Articles",
                                  >      "thesis"=>"Theses",
                                  >      "mastersthesis"=>"Master's Theses",
                                  >      "phdthesis"=>"PhD Theses",
                                  >      "manual"=>"Manuals",
                                  >      "techreport"=>"Technical Reports",
                                  >      "misc"=>"Miscellaneous",
                                  >      "unpublished"=>"Unpublished"},
                                  >    "type_aliases"=>{"phdthesis"=>"thesis", "mastersthesis"=>"
                                  >    "type_order"=>[],
                                  >    "month_names"=>nil},
 "profile"=>true,                        "profile"=>true,
 "config"=>["_config.yml"],                  "config"=>["_config.yml"],
 "serving"=>false}                       "serving"=>false}

The issue is, during the build process, the configuration on the left gets changed, and the resulting changed configuration on the right is the one that ultimately gets stored in the cache.

Then, when running jekyll build subsequently (regardless whether --incremental is passed or not), the initial configuration that's loaded from file will always have a mismatch with the cached configuration, resulting in cache deletion and rebuilding.

resolution?

I'm not quite sure how to solve this issue, given my unfamiliarity with Ruby.

Could the defaults for jekyll-scholar or bibtex be referred to elsewhere besides storing it in site.config? (jekyll-scholar/bibtex could then check whether the key exists in site.config, otherwise use the default class variable.)

inukshuk commented 5 years ago

I'm not aware that jekyll scholar changes the _config.yaml file? We certainly use the configuration a lot and we add certain defaults, but I don't think we're writing them back to site.config in memory (certainly not to the file).

Each of the scholar 'plugins' defines a local config based on our own defaults (for example, here is the bibliography tag) then calls set_context_to on each render which merges in the site.config. Since we use our local config for everything it may just be a bug somewhere that the defaults end up in site.config.

cardi commented 5 years ago

jekyll-scholar doesn't modify _config.yaml on disk, but does seem to modify site.config in memory, which is ultimately what gets stored in the cache.

When jekyll is re-run, jekyll will compare the config on disk (_config.yaml) with the cached config: since the two aren't the same, all caches are deleted.

This is what a typical config might look like: https://github.com/cardi/website/blob/master/_config.yml

This is the config that was pulled from memory and copied on to disk: https://github.com/cardi/website/blob/master/_config-bibtex.yml. When using this _config-bibtex.yml, caches work as expected. (Many thanks to @zenkalia for helping me with this).

I think that explicitly defining all the key/values in the config is an OK workaround for now, given that I'd like to prototype caching the details/references, but it might be worth looking if and how some of the plugins modify site.config directly.

(This could also be a bug in jekyll or the new Cache API -- I haven't explored it completely, but perhaps there's some hook that is aggressively checking and caching site.config.)


I created a debugging repository here: https://github.com/cardi/website/ -- running bundle install will use some of my debugging branches of jekyll and jekyll-scholar.

Running make build will use _config.yml, and you'll be able to see the config mismatches and caches being deleted each run.

Running make hydrated will use _config-bibtex.yml, and caches now work as expected because the config on disk is equal to the config in cache.

inukshuk commented 5 years ago

Thanks for your work on this!

I'm a bit removed from all of this, but I don't think we would need to alter site.config; grepping through the repo code quickly, it does not look as if we do that explicitly. With the Cache API coming, you're absolutely right that we should make sure this does not happen.