Nokogiri usage to process HTML adds whitespace nodes, causing unneeded morphdom updates

shepmaster commented 3 years ago

Stimulus Reflex uses Nokogiri::HTML to parse the rendered output of the Rails controller and selects a fragment of the HTML using a CSS selector:

https://github.com/hopsoft/stimulus_reflex/blob/ae318edc1e96d531daf420db6e715d76bf24e90c/lib/stimulus_reflex/broadcasters/page_broadcaster.rb#L11-L15

These fragments are then sent across a WebSocket to the browser where they are handed to morphdom and merged into the browser's DOM. Morphdom has some issues where changes in text nodes, even if they are just whitespace, will cause unneeded element removal and addition.

This manifests in our app as the autocomplete on a form field closing unexpectedly. You can see the same behavior in the Stimulus Reflex demo validation application:

Example of morphdom needlessly changing the input element

Here we:

Load the page
Click on the field (seeing the autocomplete)
Type a letter (triggering a Reflex action)
Morphdom deletes and recreates the input tag (closing the autocomplete)
Any future Reflex actions will have the same kinds of extra whitespace, so the autocomplete field will stay open.

In our application, the HTML has very few newlines as Slim strips almost all of them out. This means that something in the Stimulus Reflex pipeline is adding newlines. Through debugging, this is caused by the usage of Nokogiri:

input = "<!DOCTYPE html><html><head><title>Example</title></head></html>"
nokogiri = Nokogiri::HTML(input)

# This now has newlines inserted!
nokogiri.to_html
# => "<!DOCTYPE html>\n<html><head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\n<title>Example</title>\n</head></html>\n"

These differences in whitespace cause the morphdom issue.

Expected behavior

The HTML sent by Stimulus Reflex should have the same amount of whitespace as what Rails created.

Versions

StimulusReflex

Gem: 3.4.1
Node package: 3.4.1

External tools

Ruby: 2.7.2p137
Rails: 6.1.3.1
Node: 14.16.0

Browser

Chrome Canary 91.0.4471.0
Safari 14.0.3

shepmaster commented 3 years ago

A Nokogiri maintainer said:

Unfortunately, we're at the mercy of libxml2 when it comes to formatting like spaces in an emitted doc.

Some experimentation shows that libxml2 should be capable of avoiding the whitespace:

#include <string.h>
#include <assert.h>
#include <libxml/HTMLparser.h>
#include <libxml/HTMLtree.h>

int main(void) {
  char *input = "<!DOCTYPE html><html><head><title>Example</title></head></html>";

  htmlDocPtr doc = htmlReadMemory(input,
                                  strlen(input),
                                  NULL,/* "demo://example.com" */
                                  NULL,
                                  HTML_PARSE_RECOVER);

  xmlNodePtr root = xmlDocGetRootElement(doc);

  int n_bytes = htmlNodeDumpFileFormat(stdout,
                                       doc,
                                       root,
                                       NULL,
                                       0 /* No formatting */);

  assert(n_bytes != -1);

  return EXIT_SUCCESS;
}

% cc -Wall -pedantic -ansi demo.c -lxml2

% ./a.out
<html><head><title>Example</title></head></html>

(Using libxml2 2.9.10)

shepmaster commented 3 years ago

One possibility would be to use some additional options when outputting the HTML

.inner_html(save_with: Nokogiri::XML::Node::SaveOptions::NO_DECLARATION | Nokogiri::XML::Node::SaveOptions::NO_EMPTY_TAGS | Nokogiri::XML::Node::SaveOptions::AS_HTML)

Namely, I've removed FORMAT from DEFAULT_HTML.

leastbad commented 3 years ago

If you're able to give us a PR that tweaks our Nokogiri calls in such a way that this issue melts away, you'd be a hero!

shepmaster commented 3 years ago

Yep, I hope to try it out in our real app then see that or case is fixed soon.

I don’t quite know how to write a test for this in the SR code base, per se; any suggestions?

lmatiolis commented 3 years ago

Hi @leastbad, I was working with @shepmaster when this issue appeared, so I'll take a stab on finishing the fix he provided here.

I hope to try it out in our real app then see that or case is fixed soon.

I was able to confirm in our app that being explicit about not formatting works! It doesn't add any whitespace nodes with his change here:

https://github.com/hopsoft/stimulus_reflex/blob/f13a9a88b105296b56d2c0b464c5a1f241b2fa02/lib/stimulus_reflex/broadcasters/page_broadcaster.rb#L15

To this:

html = document.css(selector).inner_html(save_with: Nokogiri::XML::Node::SaveOptions::NO_DECLARATION | Nokogiri::XML::Node::SaveOptions::NO_EMPTY_TAGS | Nokogiri::XML::Node::SaveOptions::AS_HTML)

I really wish there was a way to just say don't include Nokogiri::XML::Node::SaveOptions::FORMAT, as @shepmaster said, but I didn't figure it out yet.

I don’t quite know how to write a test for this in the SR code base, per se; any suggestions?

I was checking the PageBroadcasterTest and confirmed that if I change:

https://github.com/hopsoft/stimulus_reflex/blob/f13a9a88b105296b56d2c0b464c5a1f241b2fa02/test/broadcasters/page_broadcaster_test.rb#L15

to:

"<html><head></head><body><div>New Content</div><div>Another Content</div></body></html>"

The spec fails if I only expect "<div>New Content</div><div>Another Content</div>". It only passes if I add the newline to the expectation:

"html" => "<div>New Content</div>\n<div>Another Content</div>",

So that one seems to prove the whitespace issue and after we do our fix as mentioned above, we are able to just remove that newline from the expected value and it passes.

Would that be enough to test the issue and the fix? I know we also need to apply this fix to the SelectorBroadcaster but its spec seems to be commented out?

Let me know how we want to proceed on this.

Thanks for this amazing project! <3

leastbad commented 3 years ago

I really appreciate the additional detail, and I'll try (hard) to give this some attention over the weekend.

lmatiolis commented 3 years ago

@leastbad No worries, I'll only get back to it on Monday. Just let me know the details about the tests and I should be able to get it on a PR next week. Thanks!

leastbad commented 3 years ago

I'm at a disadvantage there because I'm not a testing expert. I'm more of a "try not to break the tests" expert.

lmatiolis commented 3 years ago

Haha, that's also a good thing!

FWIW, the change that @shepmaster suggested doesn't break any test when I do bundle exec rake test_ruby, so there's that to begin with.

The changes I suggested to the test are just to prove first that we were indeed adding whitespace before but not anymore.

lmatiolis commented 3 years ago

Hi, just wanted to mention that I've pushed the discussed changes to a branch to make it more easy to follow: https://github.com/hopsoft/stimulus_reflex/compare/master...lmatiolis:avoid-adding-whitespace

Thanks!

stimulusreflex / stimulus_reflex