sparklemotion / nokogiri

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby.
https://nokogiri.org/
MIT License
6.15k stars 898 forks source link

[build bug] Modifying gumbo source doesn't cause rebuilds #2718

Closed stevecheckoway closed 4 months ago

stevecheckoway commented 1 year ago

Please describe the bug

Modifying the gumbo source files doesn't cause rake compile:nokogiri to rebuild the extension.

Help us reproduce what you're seeing

  1. Compile the extension with rake compile:nokogiri
  2. Touch a gumbo source file: touch gumbo-parser/src/util.h
  3. Recompile the extension with rake compile:nokogiri
$ bundle exec rake compile:nokogiri
/usr/bin/make install target_prefix=
install -c -p -m 755 nokogiri.bundle /Users/steve/programming/nokogiri/lib/nokogiri
cp tmp/arm64-darwin21/nokogiri/3.1.2/nokogiri.bundle tmp/arm64-darwin21/stage/lib/nokogiri/nokogiri.bundle
$ touch gumbo-parser/src/util.h
$ bundle exec rake compile:nokogiri
cp gumbo-parser/src/util.h tmp/arm64-darwin21/stage/gumbo-parser/src/util.h
/usr/bin/make install target_prefix=
install -c -p -m 755 nokogiri.bundle /Users/steve/programming/nokogiri/lib/nokogiri
cp tmp/arm64-darwin21/nokogiri/3.1.2/nokogiri.bundle tmp/arm64-darwin21/stage/lib/nokogiri/nokogiri.bundle

Note that the extension was already compiled for the first invocation so nothing needed to be done. In the second compilation, it also thinks there's nothing to do. (The same holds true if you actually modify the file, not merely touch it, but touching is traditionally sufficient for build systems.)

Expected behavior

I expect the gumbo library to be recompiled (or at least the parts of it that were changed) and the nokogiri.bundle to be rebuilt.

Environment

flavorjones commented 1 year ago

Thanks for opening this issue up, this has been mildly annoying me, too.

It's a side effect of how I ended up implementing the libgumbo build, as a separate library and not part of the extension -- using the packaged_tarball strategy.

Maybe that was a mistake, especially given your notes about link-time optimization in #2331, and I should have picked packaged_source instead. I'll play with it a bit and see if I can convince myself to do it this way.

stevecheckoway commented 1 year ago

This is definitely not a priority. I just noticed it and thought I'd make a note of it.

stevecheckoway commented 4 months ago

(Edit: I cleaned up this comment a bit since I posted it as I was rushing out the door without checking that it actually made any sense.)

I'm running into this again so I'd like to spend a little bit of time looking at how difficult it would be to work around this. The build process is a little mysterious to me currently.

When I run rake compile from a clean repo, the Rake::ExtensionTask will run extconf.rb which will build libgumbo.

If I touch one of the gumbo source files and rerun rake compile, something will have noticed the modified file and copy it into the tmp directory the extension is built from but extconf.rb is not run again.

If I change the source_pattern to include the gumbo-parser directories,

    ext.source_pattern = "{.,../../gumbo-parser/{src,test}}/*.{c,cc,cpp,h}"

and rerun rake compile, then extconf.rb will be run again but libgumbo won't be built. Rerunning rake compile again causes extconf.rb to run and again no libgumbo will be built.

So I think we need to make two changes here:

  1. Change the source pattern to consider the gumbo source directories.
  2. Change extconf.rb to rebuild libgumbo if any of the source files have changed.

I was surprised to see that touching the files causes them to be copied to tmp but the extension task isn't run. Which piece of code is performing the file copying? Could that be modified to remove whatever causes miniportile to think libgumbo is installed and thus force a rebuild?

stevecheckoway commented 4 months ago

I'm investigating adding a minimal amount of autoconf/automake to get gumbo to build normally. That should remove the bespoke Makefile system I constructed but more importantly, will hopefully allow miniportile to compile this correctly using source_directory. (It currently fails because it expects configure to have produced a Makefile in the work directory but that hasn't actually happened.)

flavorjones commented 4 months ago

@stevecheckoway This is an unfortunate situation and I'm sorry I haven't fixed it before now. The easiest situation would be to refactor the extconf.rb to allow us to re-run the libgumbo recipe on demand. If you want to inject automake I won't stop you, but that feels like a bigger change to me.

Edit: I'm going to spike on the extconf.rb refactor now.

flavorjones commented 4 months ago

Hmm, refactoring the extconf didn't fix this the way I'd hoped. Going to park it until tomorrow and try again with fresh eyes.

flavorjones commented 4 months ago

As a workaround, you can run rake clean compile test which will recompile libgumbo and nokogiri's extension files (but not libxml2/libxslt). There is a better workaround I found last year deleting specific files in tmp but I need to rediscover it.

flavorjones commented 4 months ago

@stevecheckoway OK - pushed a branch that I think does what you want

Branch is flavorjones-fix-gumbo-dev-builds which basically gives you a flag to go back to how nokogumbo used to do it (as "packaged source").

How to use it:

bundle exec rake clean clobber # clean slate
bundle exec rake compile test -- --gumbo-dev
# change file
bundle exec rake compile test -- --gumbo-dev

If you change a source file, rake compile will do what you want without re-running the extconf.rb.

LMK how it works for you.

stevecheckoway commented 4 months ago

Getting gumbo to build using autoconf/automake was surprisingly easy. I was able to integrate it with mini_portile in a very simple manner:

libgumbo_recipe = process_recipe("libgumbo", "1.0.0-nokogiri", static_p, cross_build_p, false) do |recipe|
  recipe.source_directory = File.join(PACKAGE_ROOT_DIR, "gumbo-parser")
end

I made rake gumbo:test build and run the gumbo tests.

The only hitch: It didn't solve the underlying problem in that modifying a file in gumbo-parser/src does not lead to a rebuild even though extconf.rb was definitely running. And I think I figured out why. In retrospect, it's a little obvious.

I believe that the intended result of extconf.rb is to produce a Makefile that it can use to build/rebuild the extension. So of course the usual course of things is for extconf.rb to not be run again unless it changes. The part that was confusing me was that extconf.rb is definitely being run again. But that's because it gets called by the clean-ports target that gets added to the Makefile:

all: clean-ports
clean-ports: $(DLLIB)
    -$(Q)$(RUBY) $(srcdir)/extconf.rb --clean --enable-static

One way to deal with this is to modify that to something like this.

  File.open("Makefile", "at") do |mk|
    mk.print(<<~EOF)

      .PHONY: rebuild-libgumbo

      $(TARGET_SO): rebuild-libgumbo
      rebuild-libgumbo:
      \t-$(Q)$(MAKE) -C tmp/#{libgumbo_recipe.host}/ports/libgumbo/1.0.0-nokogiri/libgumbo-1.0.0-nokogiri install
    EOF
  end

This approach works. Presumably, it would mostly work for the non-autotools build. The problem it would have is the current Makefile does not have any dependency information.

I see two paths forward

  1. Take the modify the Makefile approach outlined above and make it compile using the current gumbo-parser/src/Makefile
  2. Build gumbo with the autotools and also use the Makefile modification approach.

Path 1's advantage is smallest change required to get 90% of the functionality. I see no disadvantages relative to the situation today.

Path 2's advantages are it would unify how rake compile and rake gumbo:test compile libgumbo (although they'd still be built twice) and dependency tracking for rebuilds would work correctly. The disadvantage is that this introduces autotools to Nokogiri and we'd have to decide what gets checked into the repo. Normally, configure, aclocal.m4, and the various Makefile.ins would not get checked in. But this raises the barrier to entry to compiling Nokogiri from the GitHub repo.

Having written this all out, I'll try out path 1 tomorrow and we can decide later if autotools are worth it.

stevecheckoway commented 4 months ago

@stevecheckoway OK - pushed a branch that I think does what you want

Branch is flavorjones-fix-gumbo-dev-builds which basically gives you a flag to go back to how nokogumbo used to do it (as "packaged source").

How to use it:

bundle exec rake clean clobber # clean slate
bundle exec rake compile test -- --gumbo-dev
# change file
bundle exec rake compile test -- --gumbo-dev

If you change a source file, rake compile will do what you want without re-running the extconf.rb.

LMK how it works for you.

Ah great. I'll test that out tomorrow.

stevecheckoway commented 4 months ago

@flavorjones, that branch works great!

Since I've already gone through the exercise of autotoolizing libgumbo, I figured I'd push the branch in case you want to take a look. https://github.com/sparklemotion/nokogiri/compare/main...stevecheckoway:nokogiri:autotools-gumbo

I added one improvement: rake gumbo:test no longer builds a second version of libgumbo. This would speed up CI a bit.

flavorjones commented 4 months ago

OK - let me create a PR for my hack first, if that works and unblocks you. I'll take a look at the autotools branch as soon as I can, but a first glance looks great!

flavorjones commented 4 months ago

See https://github.com/sparklemotion/nokogiri/pull/3220

flavorjones commented 4 months ago

Aaaaand I pushed a slightly-tweaked version of your autotools branch at https://github.com/sparklemotion/nokogiri/pull/3221