rubys / nokogumbo

A Nokogiri interface to the Gumbo HTML5 parser.
Apache License 2.0
186 stars 114 forks source link

RFC: How to prepare for a future merge of nokogumbo into the nokogiri gem #170

Closed flavorjones closed 3 years ago

flavorjones commented 3 years ago

As discussed elsewhere, Nokogumbo and Nokogiri will be merging. The net result of this will be a single gem that provides both Nokogiri::HTML (nokogiri 1.11 and earlier) and Nokogiri::HTML5 (nokogumbo) in the same gem.

My current plan is simply to lift-and-shift the Nokogumbo code into the Nokogiri repository and gem in Nokogiri v1.12. Unless we do something, this will mean that users of Nokogumbo will see warning messages like this when they upgrade Nokogiri:

/home/flavorjones/code/oss/nokogumbo/lib/nokogumbo/html5.rb:7: warning: method redefined; discarding old HTML5
/home/flavorjones/code/oss/nokogiri/lib/nokogiri/html.rb:42: warning: previous definition of HTML5 was here
/home/flavorjones/code/oss/nokogumbo/lib/nokogumbo/html5.rb:13: warning: already initialized constant Nokogiri::HTML5::HTML_NAMESPACE
/home/flavorjones/code/oss/nokogiri/lib/nokogiri/html.rb:48: warning: previous definition of HTML_NAMESPACE was here

These warnings aren't just annoying; they tell us that Nokogumbo's implementation is clobbering the Nokogiri implementation, meaning that until the issue is resolved, updates to Nokogiri's code won't have any effect.

flavorjones commented 3 years ago

One idea is to have the next release of Nokogumbo check to see if Nokogiri has already defined these modules, methods, and constants, and to avoid loading itself. We could do this with a Nokogiri version check (< 1.12) or by checking defined?(Nokogiri::HTML5) (or some other module or constant).

Another idea is for the next release of Nokogumbo require nokogiri < 1.12, and the following release of Nokogumbo be an empty gem that depends on nokogiri >= 1.12.

I'd love feedback on these ideas, and to hear other folks' ideas.

stevecheckoway commented 3 years ago

I think having an empty gem as a final nokogumbo release is a great idea along with some release notes that it's a purely transitional release.

For the penultimate release, I don't have an informed opinion on whether it is better to check if things are defined or do a version check. In many cases, functionality checks are better than version checks (I'm thinking of things like feature tests in compilers), but I'm not sure that applies here. Requiring nokogiri < 1.12 might be sufficient.

Does Nokogiri follow semantic versioning? It might be best for the combined Nokogiri to have version 2. The situation I'm worried about is existing versions of Nokogumbo being used with Nokogiri that already defines the symbols. I believe that current Nokogumbo requires Nokogiri to be less than 2.0

  s.add_runtime_dependency 'nokogiri', '~> 1.8', '>= 1.8.4'

thus this situation would be avoided.

Separately from this, have you thought about importing the tests? I wrote a lot of tests for gumbo, but there are also a lot of ruby tests, including tests for the html tree-construction. I had a bunch of pull requests for those tests fixing the number of errors that should be emitted, but they never got merged. (See this in particular.) So I've just been using my own fork of the tests which isn't ideal.

rubys commented 3 years ago

defined? seems cleanest to me. An empty gem would pose problems to people that are still using what would then be a backlevel of nokogiri.

flavorjones commented 3 years ago

Thanks for the thoughtful responses. @stevecheckoway I agree that I like having a (basically) empty gem at the end of Nokogumbo's lifecycle. @rubys I also agree that using defined? is the cleanest option. I think we can do both by using both defined? and by setting gem requirements.

For the purposes of this analysis, let's assume

If we use the defined? strategy starting in Nokogumbo next, there's no need for a two-stage rollout, but the long-term state of the gem will contain all the current code:

Nokogiri Current Nokogiri Next
Nokogumbo current ✔ Uses Nokogumbo code ✔ Uses Nokogiri code
Emits "redefined" warnings
Nokogumbo next ✔ Uses Nokogumbo code ✔ Uses Nokogiri code
Prompts to remove Nokogumbo

If we use the "requirements constraints" strategy starting in Nokogumbo next:

Nokogiri Current Nokogiri Next
Nokogumbo current ✔ Uses Nokogumbo code ✔ Uses Nokogiri code
Emits "redefined" warnings
Nokogumbo next ✔ Uses Nokogumbo code
Nokogumbo final ✔ Uses Nokogiri code
Prompts to remove Nokogumbo

If we use both strategies, though, we could still end up in the desired final state regardless of the upgrade path followed by users. Imagine "next" using the defined? check, and then a "final" that sets an dependency constraint:

Nokogiri Current Nokogiri Next
Nokogumbo current ✔ Uses Nokogumbo code ✔ Uses Nokogiri code
Emits "redefined" warnings
Nokogumbo next ✔ Uses Nokogumbo code ✔ Uses Nokogiri code
Prompts to remove Nokogumbo
Nokogumbo final ✔ Uses Nokogiri code
Prompts to remove Nokogumbo

I think I like this last option best, because it gives us flexibility: the change between Nokogumbo "current" and "next" is small, and we can roll out the final release at our leisure. WDYT?


have you thought about importing the tests?

I don't have a strong opinion, but if we're depending on the tests and upstream isn't responsive, then we may as well keep them in-repo. If upstream merges and/or becomes more active, we can always periodically update the local files from upstream. WDYT?

rubys commented 3 years ago

I may need an ELI5 :-)

Here is an example application that currently uses nokogumbo: https://github.com/apache/whimsy/blob/5273b3067f939a98f7e40805e5fcf1b7b5c6e020/www/board/agenda/Gemfile#L20

If Nokogumbo were changed to add an if defined? now, it would continue to work as Nokogiri current doesn't define Nokogiri::HTML5

Once Nokogiri ships Nokogiri::HTML5, then things would continue to work with no changes and no prompting.

At some future date, perhaps a warning could be added to Nokogumbo indicating that this gem is now unnecessary and obsolete, recommending that Gemfiles be changed to reference nokogiri instead.

flavorjones commented 3 years ago

@rubys Sorry for using more words than necessary, writing helps me think but has the unfortunate side effect of being a wall of words. We're in complete agreement, that's exactly what I'm suggesting.