Open sneakers-the-rat opened 2 years ago
Huh, that sounds bad. Thanks for reporting!
I'm relatively certain that this hasn't been the case in the past, though, in general we could speed up incremental builds by using the Cache API. But what you describe definitely sounds like an additional bug. If the builds take incrementally longer, we should first figure out which code paths keep repeating and why.
I'd try to simplify the page as much as possible. That is, disable all other plugins and use the @bibliography
tag only once. Then either install byebug
and use breakpoints or add some puts
statements to jekyll-scholar to trace what's being run during the incremental builds. As a start, I'd trace these calls:
These should tell us if the bibliography tag gets initialized/rendered repeatedly. If the builds take incrementally longer, I'd assume that either the rendering happens repeatedly or, if it does not, then the next step would be to time the individual steps of the render method; especially here:
I'd time the individual steps to see what's consuming incrementally more time then dig down further.
Before rendering the bibliography, styles and locale must be loaded. Currently, both are loaded into static memory, so on incremental builds the styles and locale files should not have to be reloaded. This should make successive builds quicker, but since that's not the case it's also worth testing this assumption. Styles are loaded into this static hash:
I'd also put breakpoints or debug output there to check whether or not the styles are being loaded for every build; I doubt this is directly related to the issue, but it would be good to know whether the plugin is completely reloaded during incremental builds or whether the styles stay in memory.
OK I was able to profile the build like this (my first ruby!)
require "jekyll"
require "ruby-prof"
result = RubyProf.profile do
options = {
"source" => './',
"destination" => './_site',
"livereload" => true,
'watch' => true,
'verbose' => true
}
Jekyll::Commands::Build.process(options)
end
stack_printer = RubyProf::CallStackPrinter.new(result)
flat_printer = RubyProf::FlatPrinter.new(result)
graph_printer = RubyProf::GraphHtmlPrinter.new(result)
File.open("_profile_stack.html", 'w') { |file| stack_printer.print(file) }
File.open("_profile_flat.txt", 'w') { |file| flat_printer.print(file) }
File.open("_profile_graph.html", 'w') { |file| graph_printer.print(file) }
and so I profiled it doing a single build (144s) and a run where I edited the document so there were two builds (145s and 376s)
I've attached the profiling results -- three formats, detailed in ruby-prof docs
It looks like BibTeX
is calling the children of BibTeX::Entry#parse_month
3x as much in the double build than the single build:
Single build:
%self total self wait child calls name
18.14 97.151 27.229 0.000 69.922 27506072 BibTeX::Value#eql?
17.09 65.541 25.660 0.000 39.881 27624376 BibTeX::Value#to_s
11.18 25.221 16.790 0.000 8.431 27624376 BibTeX::Value#value
6.50 107.606 9.766 0.000 97.840 24472 BibTeX::Entry#parse_month
5.62 8.432 8.432 0.000 0.000 27627632 BibTeX::Value#atomic?
5.42 8.137 8.137 0.000 0.000 55423220 Hash#key?
4.71 7.064 7.064 0.000 0.000 29489151 Symbol#to_s
3.14 4.709 4.709 0.000 0.000 27506343 String#eql?
1.27 1.909 1.909 0.000 0.000 6036068 StringScanner#scan
1.17 2.265 1.752 0.000 0.513 2004900 BibTeX::Lexer#push
1.06 3.933 1.593 0.000 2.341 1065795 BibTeX::Lexer#parse_bibtex
0.89 119.409 1.330 0.000 118.079 23124 *Racc::Parser#_racc_do_parse_c
Double build:
%self total self wait child calls name
19.82 307.347 85.552 0.002 221.793 82537056 BibTeX::Value#eql?
18.99 206.892 81.963 0.003 124.926 82655352 BibTeX::Value#to_s
11.97 78.859 51.656 0.007 27.196 82655352 BibTeX::Value#value
6.95 338.150 29.981 0.002 308.167 24472 BibTeX::Entry#parse_month
6.36 27.479 27.467 0.012 0.000 165484755 Hash#key?
6.30 27.197 27.197 0.000 0.000 82658608 BibTeX::Value#atomic?
4.41 19.031 19.027 0.004 0.000 83907880 Symbol#to_s
3.53 15.255 15.251 0.004 0.000 82537201 String#eql?
0.46 1.979 1.979 0.000 0.000 6034576 StringScanner#scan
0.44 2.462 1.916 0.000 0.546 2004900 BibTeX::Lexer#push
0.39 4.164 1.699 0.000 2.465 1065795 BibTeX::Lexer#parse_bibtex
0.33 348.408 1.412 0.000 346.996 23124 *Racc::Parser#_racc_do_parse_c
I don't have more time now but thank you for your explanation, will dig into the code and try the more explicit breakpoints later if that would be helpful :) Not sure why the children would be getting called 2x as often as they should when parse_month
is called the same amount, but i also haven't looked at why it's happening! Scholar is the thing that is taking by far the longest out of anything in the build (but i really appreciate your work on it!), and so it would be nice to figure out some way to speed that up and i'll set aside some time to help on that :)
Thanks! I'd run this a few more times. The incremental build-up you describe in the OP would suggest some conceptual bug that causes incremental builds to take successively longer and should become more apparent with each extra build.
So far this points looks like the time is being spent parsing the bibliography. This is understandable and something that has not been optimized: for each run the bibliography is parsed again, because the BibTeX file may have changed; during builds with multiple sources the bibliography may also be parsed multiple times because the parse results are discarded in between. Both could be addressed by caching the parse results and utilizing the Cache API to detect changes on the source files. Still, just parsing the bibliography over and over again should not lead to increasingly longer build times, so there must be something else going on in addition to this.
sorry mebs i wasn't clear! there are two separate runs,
so things in the second run (with two builds) that take 3x as long instead of 2x as long are the things that cause the time to increase. I've run it again and every time those same calls take longer :) will be tied up for another few weeks but want to help make this tool work!
This seems to work? https://github.com/sneakers-the-rat/jekyll-scholar/commit/4d63f88c909b136fd0b0b5533e7f7a0d3dc1b3f2
went from
Configuration file: _config.yml
Source:
Destination: /_site
Incremental build: enabled
Generating...
done in 48.744 seconds.
Auto-regeneration: enabled for ''
LiveReload address: http://127.0.0.1:35729
Server address: http://127.0.0.1:4000/
Server running... press ctrl-c to stop.
LiveReload: Browser connected
[2021-10-08 18:02:14] ERROR `/favicon.ico' not found.
Regenerating: 1 file(s) changed at 2021-10-08 18:02:25
_sections/2_2_federation.md
...done in 111.003382 seconds.
Regenerating: 1 file(s) changed at 2021-10-08 18:04:22
_sections/2_2_federation.md
...done in 191.76235 seconds.
to
Incremental build: enabled
Generating...
done in 0.043 seconds.
Auto-regeneration: enabled for '
LiveReload address: http://127.0.0.1:35729
Server address: http://127.0.0.1:4000/
Server running... press ctrl-c to stop.
LiveReload: Browser connected
[2021-10-08 18:14:38] ERROR `/favicon.ico' not found.
Regenerating: 1 file(s) changed at 2021-10-08 18:14:49
_sections/2_2_federation.md
...done in 6.407052 seconds.
Regenerating: 1 file(s) changed at 2021-10-08 18:14:58
_sections/2_2_federation.md
...done in 6.221516 seconds.
[2021-10-08 18:15:12] ERROR `/favicon.ico' not found.
Regenerating: 1 file(s) changed at 2021-10-08 18:15:29
_sections/2_2_federation.md
...done in 5.949184 seconds.
not bad! I really truly do not know how to program in Ruby so this probably breaks other stuff, but it seems to at least indicate that the problem is somewhere in the way the BibTeX library is parsing things, or how often it's called, etc.
should I PR?
Yes, it would be great to use caching for the bibliography. Though we should use the bibliography paths as the cache key in order to support parsing multiple bibliographies. We'd also have to look into how we can clear the path if the bibliography file changes -- though we could probably just document this (i.e. you need to restart if you change the bibliography file).
aha makes sense, still not really sure how ruby works, specifically how instances vs classes work exactly, but i can take a crack at it.
I tried to mirror your cache clearing code, which seems like it checks if the paths change, but yes I don't know how to invalidate if contents of each file change, will investigate. For now it seems to be working, it does have missing references, presumably from not clearing the cache, but if i do a restart then it catches all the changes after that, so not quite sure how that's working.
circling back around, i'm going to work on some other extensions, and wanted to know if you wanted me to work this into a PR? It still suffers from not adding new citatations that are added after jekyll serve
is called, which i imagine is from not properly clear the cache. I recognise you're probably pretty strapped for time, but if you have any pointers on what cache needs to be invalidated i'd be happy to PR :)
@sneakers-the-rat sorry, I haven't been following this recently, but having looked at it quickly, I think the blocking issue in the other thread is resolved now. Would be more than happy to merge a PR opting into the cache API.
I tried the OP's code too. Before:
done in 330.149 seconds.
After
done in 23.193 seconds.
I would be very happy if this could be integrated :+1:
Hello! Really appreciate the plugin. I've noticed it might be that the version of Jekyll i'm using isn't supported (4.2.1) but i've noticed when i have the jekyll-scholar gem loaded and am rendering a bibliography on a page, every time that I change the source and the pages rebuild they take n times longer -- eg. if original page takes 20 seconds to build, second build takes 40, third takes 60, and so on. I've tried
--verbose
,--profile
and a bunch of other profiling methods to figure out what's taking longer but can't quite crack it -- though with--trace
it's always in the middle of some jekyll-scholar method when i call it.Any idea what might be causing the time doubling? some cache somewhere maybe? If you point me in the right direction I can try and PR since I've seen elsewhere you're mostly in maintenance mode <3