Closed stevecheckoway closed 4 months ago
Thanks for opening this issue up, this has been mildly annoying me, too.
It's a side effect of how I ended up implementing the libgumbo build, as a separate library and not part of the extension -- using the packaged_tarball
strategy.
Maybe that was a mistake, especially given your notes about link-time optimization in #2331, and I should have picked packaged_source
instead. I'll play with it a bit and see if I can convince myself to do it this way.
This is definitely not a priority. I just noticed it and thought I'd make a note of it.
(Edit: I cleaned up this comment a bit since I posted it as I was rushing out the door without checking that it actually made any sense.)
I'm running into this again so I'd like to spend a little bit of time looking at how difficult it would be to work around this. The build process is a little mysterious to me currently.
When I run rake compile
from a clean repo, the Rake::ExtensionTask
will run extconf.rb
which will build libgumbo.
If I touch one of the gumbo source files and rerun rake compile
, something will have noticed the modified file and copy it into the tmp
directory the extension is built from but extconf.rb
is not run again.
If I change the source_pattern
to include the gumbo-parser
directories,
ext.source_pattern = "{.,../../gumbo-parser/{src,test}}/*.{c,cc,cpp,h}"
and rerun rake compile
, then extconf.rb
will be run again but libgumbo won't be built. Rerunning rake compile
again causes extconf.rb
to run and again no libgumbo will be built.
So I think we need to make two changes here:
extconf.rb
to rebuild libgumbo if any of the source files have changed.I was surprised to see that touching the files causes them to be copied to tmp
but the extension task isn't run. Which piece of code is performing the file copying? Could that be modified to remove whatever causes miniportile to think libgumbo is installed and thus force a rebuild?
I'm investigating adding a minimal amount of autoconf/automake to get gumbo to build normally. That should remove the bespoke Makefile system I constructed but more importantly, will hopefully allow miniportile to compile this correctly using source_directory
. (It currently fails because it expects configure
to have produced a Makefile
in the work directory but that hasn't actually happened.)
@stevecheckoway This is an unfortunate situation and I'm sorry I haven't fixed it before now. The easiest situation would be to refactor the extconf.rb to allow us to re-run the libgumbo recipe on demand. If you want to inject automake I won't stop you, but that feels like a bigger change to me.
Edit: I'm going to spike on the extconf.rb refactor now.
Hmm, refactoring the extconf didn't fix this the way I'd hoped. Going to park it until tomorrow and try again with fresh eyes.
As a workaround, you can run rake clean compile test
which will recompile libgumbo and nokogiri's extension files (but not libxml2/libxslt). There is a better workaround I found last year deleting specific files in tmp
but I need to rediscover it.
@stevecheckoway OK - pushed a branch that I think does what you want
Branch is flavorjones-fix-gumbo-dev-builds
which basically gives you a flag to go back to how nokogumbo used to do it (as "packaged source").
How to use it:
bundle exec rake clean clobber # clean slate
bundle exec rake compile test -- --gumbo-dev
# change file
bundle exec rake compile test -- --gumbo-dev
If you change a source file, rake compile
will do what you want without re-running the extconf.rb.
LMK how it works for you.
Getting gumbo to build using autoconf/automake was surprisingly easy. I was able to integrate it with mini_portile in a very simple manner:
libgumbo_recipe = process_recipe("libgumbo", "1.0.0-nokogiri", static_p, cross_build_p, false) do |recipe|
recipe.source_directory = File.join(PACKAGE_ROOT_DIR, "gumbo-parser")
end
I made rake gumbo:test
build and run the gumbo tests.
The only hitch: It didn't solve the underlying problem in that modifying a file in gumbo-parser/src
does not lead to a rebuild even though extconf.rb
was definitely running. And I think I figured out why. In retrospect, it's a little obvious.
I believe that the intended result of extconf.rb
is to produce a Makefile
that it can use to build/rebuild the extension. So of course the usual course of things is for extconf.rb
to not be run again unless it changes. The part that was confusing me was that extconf.rb
is definitely being run again. But that's because it gets called by the clean-ports
target that gets added to the Makefile
:
all: clean-ports
clean-ports: $(DLLIB)
-$(Q)$(RUBY) $(srcdir)/extconf.rb --clean --enable-static
One way to deal with this is to modify that to something like this.
File.open("Makefile", "at") do |mk|
mk.print(<<~EOF)
.PHONY: rebuild-libgumbo
$(TARGET_SO): rebuild-libgumbo
rebuild-libgumbo:
\t-$(Q)$(MAKE) -C tmp/#{libgumbo_recipe.host}/ports/libgumbo/1.0.0-nokogiri/libgumbo-1.0.0-nokogiri install
EOF
end
This approach works. Presumably, it would mostly work for the non-autotools build. The problem it would have is the current Makefile
does not have any dependency information.
I see two paths forward
Makefile
approach outlined above and make it compile using the current gumbo-parser/src/Makefile
Makefile
modification approach.Path 1's advantage is smallest change required to get 90% of the functionality. I see no disadvantages relative to the situation today.
Path 2's advantages are it would unify how rake compile
and rake gumbo:test
compile libgumbo (although they'd still be built twice) and dependency tracking for rebuilds would work correctly. The disadvantage is that this introduces autotools to Nokogiri and we'd have to decide what gets checked into the repo. Normally, configure
, aclocal.m4
, and the various Makefile.in
s would not get checked in. But this raises the barrier to entry to compiling Nokogiri from the GitHub repo.
Having written this all out, I'll try out path 1 tomorrow and we can decide later if autotools are worth it.
@stevecheckoway OK - pushed a branch that I think does what you want
Branch is
flavorjones-fix-gumbo-dev-builds
which basically gives you a flag to go back to how nokogumbo used to do it (as "packaged source").How to use it:
bundle exec rake clean clobber # clean slate bundle exec rake compile test -- --gumbo-dev # change file bundle exec rake compile test -- --gumbo-dev
If you change a source file,
rake compile
will do what you want without re-running the extconf.rb.LMK how it works for you.
Ah great. I'll test that out tomorrow.
@flavorjones, that branch works great!
Since I've already gone through the exercise of autotoolizing libgumbo, I figured I'd push the branch in case you want to take a look. https://github.com/sparklemotion/nokogiri/compare/main...stevecheckoway:nokogiri:autotools-gumbo
I added one improvement: rake gumbo:test
no longer builds a second version of libgumbo. This would speed up CI a bit.
OK - let me create a PR for my hack first, if that works and unblocks you. I'll take a look at the autotools branch as soon as I can, but a first glance looks great!
Aaaaand I pushed a slightly-tweaked version of your autotools branch at https://github.com/sparklemotion/nokogiri/pull/3221
Please describe the bug
Modifying the gumbo source files doesn't cause
rake compile:nokogiri
to rebuild the extension.Help us reproduce what you're seeing
rake compile:nokogiri
touch gumbo-parser/src/util.h
rake compile:nokogiri
Note that the extension was already compiled for the first invocation so nothing needed to be done. In the second compilation, it also thinks there's nothing to do. (The same holds true if you actually modify the file, not merely touch it, but touching is traditionally sufficient for build systems.)
Expected behavior
I expect the gumbo library to be recompiled (or at least the parts of it that were changed) and the
nokogiri.bundle
to be rebuilt.Environment
main
branch.