Open thrau opened 5 years ago
Is it possible that the detail pages are being generated? If I remember correctly, all the generator plugins will run regardless of which sites need to be built. The long time it takes is probably caused by loading the style and processing the references; you're right that this should not be necessary unless building pages containing the actual references, so it would be great to improve that!
detail pages are not being generated in my configuration.
I have a similar experience in lengthy build times that likely has to do with generating multiple bibliographies from citations across different pages, in addition to a monolithic page that iterates through all references.
Using Jekyll 3.8.5 with jekyll-scholar 5.14.1.
references.bib contains 203 entries, 197KB, with an ACM SIG proceedings style.
Some (rough) benchmarks:
Given that most of the detail pages won't change too often from the underlying BibTeX or style used, a one-time expensive cost in the initial generation is manageable.
Generating references and detail pages might benefit from the upcoming Jekyll 4.0 Cache API.
I'm not familiar with Ruby or the internal workings of jekyll/jekyll-scholar, but if you can point to where I or someone else might start, that would be helpful.
To speed-up the generation of detail pages, you could add some conditions around here). After generating the detail pages, we could write some kind of manifest or save a timestamp which we could compare to the modification date of the bib file. That way, we'd generate details only if the bib file has changed since the last time the detail pages were generated. (A more granular approach, at the entry level, is probably not worth the effort.)
Thanks. That seems like a reasonable approach–I'll see if I can't make a first pass over the next few days.
The other aspect in build times is, I think, building bibliographies from citations (e.g., {% bibliography --cited %}
).
If all the entries in the bibliography are parsed, and then references for each cited entry are being built each time the bibliography
command is called, I could see caching the entry in some way to plausibly save lots of time.
I've prototyped something quickly to use the Cache API when generating details pages. The results are looking very promising:
files | total (sec) | average (sec) | median (sec) | min (sec) | max (sec) | |
---|---|---|---|---|---|---|
first run (all cache misses) | 206 | 205.353 | 0.996859 | 1.007 | 0.333644 | 2.166210 |
second run (all cache hits) | 206 | 0.040451 | 0.000196364 | 0.000186 | 0.000141 | 0.000586 |
Results may vary, since the underlying cache is loading each entry from disk the first time it's called (perhaps as the Cache API evolves, jekyll could warm up the cache by loading the entirety of the cache from disk into memory, or using a different backing store, but I don't anticipate working on that anytime soon.)
This should work well, especially for incremental builds: jekyll+jekyll-scholar will only build new BibTeX entries.
Some edge cases I haven't quite thought about yet, that won't trigger a rebuild of the details pages:
I'd expect the above operations to happen rarely, so I think incurring the expensive cost is OK, but at the moment there are two ways to trigger a complete rebuild:
.jekyll-cache
directory_config.yml
Perhaps there will be a flag that one can pass to jekyll build
that clears the cache when 4.0 is released.
Looks great! Did you figure out where scholar was modifying site.config
?
Regarding the cache invalidation, perhaps we could create some kind of manifest file for the details pages with a checksum of the BibTeX file? That way we could detect when a rebuild is required.
Looks great! Did you figure out where scholar was modifying
site.config
?
I haven't, but I plan on taking a closer look after I've polished the caching code.
Regarding the cache invalidation, perhaps we could create some kind of manifest file for the details pages with a checksum of the BibTeX file? That way we could detect when a rebuild is required.
That seems like a good approach that will take care of most of the issues, even if it is a bit heavy-handed. I suppose we could do the same with the layout for the details page.
Another, maybe easy approach that I've just thought of is to cache the hashes of each BibTeX entry: if the cached hash doesn't match or doesn't exist, then re-build that particular entry. I think this would only work if the BibTeX object (dictionary?) in Ruby is consistently ordered in a deterministic way.
@cardi still working on this? want to join forces on a PR with what we're talking about here? https://github.com/inukshuk/jekyll-scholar/issues/335
@cardi still working on this? want to join forces on a PR with what we're talking about here? #335
It's been a while since I've looked at this, and I'm still interested in having this feature implemented.
I made a first pass at using Jekyll's Cache API here: https://github.com/cardi/jekyll-scholar/tree/cached-details, but a critical blocker (that may or may not have been resolved since) is that any change to site.config
internally will invalidate the cache and rebuild everything.
While I documented the issue and my findings in https://github.com/inukshuk/jekyll-scholar/issues/262, I don't have a proposed fix for it. (Maybe storing some of jekyll-scholar's settings in a different variable outside of site.config
?)
I think https://github.com/inukshuk/jekyll-scholar/issues/262 has to be resolved before caching can be implemented and used.
@cardi I took a quick look at this and I think that the BibTeX converter merged in the default scholar config during initialization. Give it another go, to see if this fixes the issue you'd been seeing.
I have a 700kb bibtex file with about a thousand entries, and one file to render it. So building the entire source is naturally a little slow (22 seconds).
However I found that
jekyll --watch --incremental
takes the same amount of time when building files that have no dependencies to the bibtex file.My
publications.md
file is below. Interestingly, when I add a query to filter, e.g., only publications from 2018 (50-100 or so), the build speeds up drastically (22s -> 2s).Any idea what the problem could be? In particular that the incremental build of unrelated files is affected by the amount of bibtex entries rendered in other files seems odd to me. I'm not familiar enough with jekyll to understand whether this is a jekyll related problem or has something to do with the plugin.
_config.yml
:/usr/bin/ruby2.3 /usr/local/bin/jekyll b -d /var/www/html/ -s /home/webmaster/source --watch --incremental