Open saschanaz opened 6 years ago
Definitely. I've hit this on Dreamhost too.
The problem in the past (years ago!) was that JSDOM was super-duper slow for whatever reason - specially on big specs - so it wasn't feasible to use it. That's probably been addressed since then, but someone would need to try it out.
I thought I'd report here because I've been investigating how to make Respec run in JSDOM (for Reffy). Good news is it's mostly doable. Bad news is it's not exactly straightforward...
The tweaks that I needed to make to JSDOM and Respec in Reffy are parked in a drop-respec
branch in Reffy for now. See code and comments in util.js
. Here is a summary of problems I encountered and workarounds I found so far:
JSDOM does not support URL.createObjectURL
:
Respec uses that to create a separate worker for the core/highlight
module. Dropping that module solves the problem, but obviously means that the module does not run.
JSDOM does not support document.createRange
:
Respec uses that in the core/list-sorter
module. Dropping that module solves the problem, with the obvious same consequence as above.
Also note #1665 related to that module: JSDOM throws an error because the generated CSS selector is invalid.
JSDOM's CSS parser is not necessarily up-to-date:
JSDOM does not like the @keyframes
and @supports
directives in particular and throws an error when parsing the CSS Respec injects. Not sure this really has any real impact, but since I don't need the CSS in Reffy, replacing them with @media
directives does the trick.
JSDOM does not support innerText
:
Given the use in Respec, replacing occurrences of innerText
in Respec by textContent
solves the problem.
marked behaves strangely in the presence of <pre>
blocks when there are no blank lines after the block.
For instance, if you look at this markdown demo, you'll note that the generated HTML for the <pre>
tag includes unexpected (and broken) <p>
tags.
Now, the problem is that Respec drops blank lines before it calls that library. An example of spec where this happens is Netinfo. Interestingly, in "real" browsers, the weird tags magically disappear afterwards and the generated HTML looks clean (but the output of markdownToHtml
is wrong). I do not know why. In JSDOM, the generated HTML stays ugly, and extracting the IDL afterwards is problematic.
Replacing \n
with \n\n
in normalizePadding
to preserve a blank line fixes the issue.
JSDOM does not support cloning of attributes:
This is needed by HyperHTML. In theory, providing a polyfill is easy but constructors and prototypes are shared in JSDOM, so there is no good way to get back to the underlying document when given an attribute to clone that does not belong to one, or at least I could not find a good solution here. This requires tweaking the code of HyperHTML to call createAttributeNS
itself when the attribute it wants to clone does not belong to a document.
JSDOM does not support inserAdjacentElement
:
It's easy to come up with a polyfill.
JSDOM does not support closest
:
Same thing, it's easy to come up with a polyfill.
JSDOM does not support matchMedia
:
Given that Reffy does not need that logic, it's easy to mock the interface
JSDOM does not support fetch
:
Actually, that's fine for Reffy, because I want to control network requests as much as possible and force usage of a local cache when possible.
JSDOM does not support a few visual methods on window
:
In particular, blur
, focus
, moveBy
, moveTo
, resizeBy
, resizeTo
, scroll
, scrollBy
, scrollTo
are not supported (and a couple of specs call scrollBy
). Reffy does not need that logic, so it's easy to mock these methods.
JSDOM does not support IndexedDB:
Respec degrades gracefully when IndexexDB
is not supported. It still outputs an obscure error message to say that it cannot call open
on undefined
(where undefined
should have been IndexedDB
). Detecting whether IndexedDB
exists before trying to access it in Respec would allow to report a more useful error message.
Also worth noting here: the readyPromise
gets rejected but, depending on the spec, the promise may be checked in another tick. When that happens, Node.js reports an UnhandledPromiseRejectionWarning
, which is, as far I can tell, a false positive.
JSDOM does not seem to support the whatToShow
filter in TreeWalker
:
As a result, HyperHTML fails to remove the <!-- _hyper: xxxx -->
comments it adds all over the place when it runs. That does not have any consequence.
As far as I can tell, there is no easy way to run code in JSDOM once Respec has started to run, and in particular when the respecIsReady
promise is set. As a result, I need to do a bit of polling to wait for the promise to appear so that I can attach to it and detect the end of the generation. Or is there a better way?
With the workarounds and limitations mentioned above, I manage to run Reffy on all W3C specs that Reffy knows about, and the outcome seems good!
Thanks for sharing your experience!
Yeah, I also tried JSDOM and faced those limits. Maybe we should change the whole logic. Some personal thoughts:
@tidoust, I thought Reffy ran against "static" documents? If we had a Reffy web service, ReSpec could simply post a static snapshot directly from the browser without needing to run any transformations on the server.
To rephrase: what is Reffy's input?
To rephrase: what is Reffy's input?
Reffy's goal is to generate a knowledge graph of specs and an analysis of that graph to create a list of potential anomalies. To generate the knowledge graph, Reffy:
The input of the Reffy web service that we talked about is slightly different. It would take as input:
From that, it would simply have to parse the provided spec, and analyses the result against the knowledge graph. No transformation is needed on the server for that (unless the input is the raw source, obviously).
I started to develop such a service (see reffy-service). I haven't had time to test it with POST
content, so it's most probably broken for now. It also lacks a README for now. Note the service needs the HTML and the URL of the spec to test. The URL is needed when the spec already exists in the knowledge graph, so that Reffy can remove it from there first before, otherwise the resulting graph would contain two specs that e.g. define the same IDL terms, and analysis would report that as anomalies.
I'm now looking at ways to make the service more robust before we deploy it on one of our servers. The investigation reported here is part of that, as an attempt to reduce the number of dependencies that would have to run on the server.
@tidoust thanks for clarifying. @saschanaz, do you know if puppeteer runs without needing a virtual screen? If yes, then that addresses GPU-less CLI environments problem in one sense... still a fat dependency, but at least we don't need to refactor code significantly to back port ReSpec to JSDOM.
do you know if puppeteer runs without needing a virtual screen?
puppeteer runs in headless mode by default, so yes.
@tidoust, given that Puppeteer runs without needing the virtual display, is it a viable alternative to JSDOM for Reffy?
@tidoust, given that Puppeteer runs without needing the virtual display, is it a viable alternative to JSDOM for Reffy?
Sure! I was not reporting these issues as requests to change ReSpec. The problem here is not how ReSpec is written, it is JSDOM not yet supporting enough features for ReSpec. I just wanted to document the current limitations somewhere.
Actually, we still have a virtual screen for now on the server, but maybe that's no longer needed with Puppeteer (it was for nightmare). We just haven't tried without. But that was not the reason why I wanted to get rid of RespecDocWriter, it was mostly to simplify dependencies and avoid having to run a full chromeless browser on the server.
I just wanted to document the current limitations somewhere.
Ok cool. Sorry I was confused. I appreciate the documentation above also, so thanks for that!
We just haven't tried without. But that was not the reason why I wanted to get rid of RespecDocWriter, it was mostly to simplify dependencies and avoid having to run a full chromeless browser on the server.
Understood. Ok, if there is anything else we can do, happy to try to help. Reffy looks super cool and if we can integrate somehow directly (via POST or whatever), happy to help with that however we can!
@marcoscaceres This has always been my ultimate purpose, for binary-less lighter dependencies, a scriptable API, easier test without karma, and easier debug. Could we bake an experimental build with minimal features to check what should be done, although I'm currently not sure which feature to start?
Edit: My vague thought:
Document
object, user configuration, definition maps, biblio maps, etc.respecConfig
value.document
and the conf
argument. Normalizing configuration shouldn't modify the input configuration to prevent unexpected side effect.Excited to see where this ends up!
I got a blocker from hyperHTML, which cannot be imported on node.js. Maybe we have to fork viperhtml
(which is a lot lighter than hyperhtml
but only for server) to make it work.
Pinged WebReflection for guidance.
I got a blocker from hyperHTML, which cannot be imported on node.js
hyperHTML is 100% code covered via node, all you need to do is to write require('basichtml').init()
before requiring hyperhtml
.
I've used basichtml to serve output even in cloudflare so everything you need should be there.
However, viperHTML is always an alternative if you need just plain SSR π
So it looks like require('basichtml').init()
defines document
globally? That somehow clashes with jsdom
... But thanks, TIL basichtml
.
@saschanaz if you have jsdom
, and it pollutes the global scope, hyperHTML
should work already.
If you don't you can use an empty .init()
or specify various things, including the object to pollute.
i.e.
const window = {};
require('basichtml').init({window});
const {customElements, document, HTMLElement} = window;
customElements.define('hello-world', class extends HTMLElement {
connectedCallback() {
this.textContent = 'Hello world!';
}
});
// workaround to execute with a temporary global document
const $ = fn => {
const oldDocument = global.document;
global.document = document;
const result = fn();
if (oldDocument)
global.document = oldDocument;
else
delete global.document;
return result;
};
const {bind} = $(() => require('hyperhtml'));
$(() => {
bind(document.documentElement)`
<head>
<title>This is basicHTML</title>
</head>
<body><hello-world/></body>`;
});
console.log(document.toString());
/*
<!DOCTYPE html><html><head>
<title>This is basicHTML</title>
</head><body><hello-world>Hello world!</hello-world></body></html>
*/
but then again, if you have already a document
everything should just work, you just need to require hyperhtml
after π
I mean require("basichtml").init(); require("jsdom");
fails. Maybe jsdom detects browser environment by checking global existence of document
or something...
Currently I decided to use viperhtml for template string and parse the result with jsdom again, as I'm currently trying to remove any global or module level states. It's still a draft, so maybe I could migrate to basichtml
later if it looks great enough.
I mean require("basichtml").init(); require("jsdom"); fails.
that is not a real-world use case, but even if you need to do that, you can pass an object instead, so nothing gets polluted, and jsdom will work.
Currently I decided to use viperhtml for template string and parse the result with jsdom again, as I'm currently trying to remove any global or module level states
You are contradicting yourself, since jsdom pollute the global scope, right?
It's still a draft, so maybe I could migrate to basichtml later if it looks great enough.
Sure, whatever works best. I don't even know what is your actual goal (had no time to read this whole thread, I just jumped in at "I got stuck due node" when all I do is using node for everything I develop on the web π)
since jsdom pollute the global scope
AFAIK it doesn't, does it?
actually my bad, it doesn't. I've seen tests written with global.window = new JSDOM
but apparently it doesn't indeed pollute anything by its own.
Then again, neither does basichtml
, it's just the handy default that never, personally, hurts.
In regards of hyperHTML
it needs to do some feature detection requiring a document
in a way or another.
@tidoust Hi! Unfortunately we found that #2187 introduces too many changes and creates additional maintenance burden to us. (Even when it might give some hypothetical wins π)
I want to hear your current need again to decide the future of #2187. Could you use respec2html
now that we depend on headless Chromium? If you can't, what would be the blockers?
Hello @saschanaz,
I want to hear your current need again to decide the future of #2187. Could you use respec2html now that we depend on headless Chromium? If you can't, what would be the blockers?
I don't think that there is anything blocking us from using respec2html
in practice. It was more a matter of reducing dependencies to large binaries (also not to have to run a virtual screen on the server, but that's no longer needed with a headless Chromium).
The one thing that I do now, and which I don't know if I can easily do if we switch to respec2html
, is to hook my own implementation of "fetch" to force the use of cached versions of resources so that I can run crawls offline (for testing purpose). That is not a requirement though, and I'm sure I'll find a way to do that.
I totally understand the maintenance burden argument. I actually considered using a forked version of Respec with #2187 baked in... and decided against it for the very same reason.
+@dontcallmedom who was the one advocating for the switch to JSDOM in the first place.
I agree it'd a nice to have; if it's too costly to maintain, then we'll absorb the cost in running with headless chromium.
Okay, that means this is now officially low-priority unless we find an easier way.
@marcoscaceres I still want to keep #2187 open to 1) extract some good thing from there 2) maybe just for me to experiment things. Sounds good?
Sounds good. Yes, letβs grab whatever goodies we can from #2187.
A new possibility: someone made a JS-to-binary compiler that also supports WASM.
Not sure it's any valid, though.
ReSpec shouldn't require installing real browsers to use
respec2html.js
as jsdom supports browser-like DOM manipulation in Node.js. Dropping Nightmare (and so Electron) dependency will allow easier build environment settings on GPU-less CLI environments.