Open cawa-93 opened 7 years ago
It's not possible at the moment. You can suppress errors using your own VirtualConsole though.
The correct thing to do is indeed to set a base URL.
The ability to disable certain parts of jsdom features could help optimize performance. And when scrapping content the CSS processing is certainly not needed.
@pawel-dubiel, Right. In my case, I'm only interested in httml and manipulation with it. I do not need the processing of the CSS and javascript. I would like to disable all this, to improve the speed of work.
@cawa-93 I'm just thinking if you don't need to use JS/CSS and you don't need DOM API you may consider using cheerio (which should be a few times faster, or at least it was a few years ago ).
Do you have any evidence of a performance increase? With my knowledge of jsdom's architecture, there shouldn't be much, if any, since it's all done lazily.
@domenic 8x is a claim from https://github.com/cheeriojs/cheerio
"Blazingly fast: Cheerio works with a very simple, consistent DOM model. As a result parsing, manipulating, and rendering are incredibly efficient. Preliminary end-to-end benchmarks suggest that cheerio is about 8x faster than JSDOM."
And this old screencast https://vimeo.com/31950192 which at the end compares both jsdom and cheerio performance, but it's 6 years old.
It would be really interesting to see results from 2017.
But both projects have a different scope.
Yes, those Cheerio results are really dishonest, as we've mentioned to their author before. And they certainly have nothing to do with CSS.
I'm in the camp where cssom
fails to parse my CSS but I still need to parse the js
.
I too would like to disable inline CSS parsing - it is mostly a waste of CPU and memory in my various usages of jsdom.
For me is having to write more code to gracefully handle CSS parsing "errors" that is bothering me. Even if my CSS is valid (It is just not supported by JSDOM's parser).
EDIT: maybe I should try again with the newer versions
Here is a script which, when run along with a downloaded copy of https://www.w3.org/TR/html52/single-page.html takes around 6 seconds less on my machine when evaluateStylesheet
is commented out in HTMLStyleElement-impl.js
:
const {JSDOM} = require("jsdom");
console.time("DOM parsing");
JSDOM.fromFile("single-page.html").then(dom => {
console.timeEnd("DOM parsing");
});
That's great evidence; thanks!
Has something changed since January 22th?
We have the same issue. We parse hundreds of pages per minute and I tried to remove css from most of them manually and parsing time decreased drastically, also we get bunch of Error: Could not parse CSS
. Please make parsing of the css optional. What's the best way to manually fork and disable css parsing?
Fork with stylesheet parsing disabled: https://github.com/dfblue/jsdom
Quite surprised this wasn't thought of when implementing it! Processing just raw HTML seems to be a very common case when CSS is just a burden to consider.
I would like to see an option to disable CSS processing as well..
Just hit the wall.
@domenic I'm not a communist, but let me ask you one thing. How much do you need to implement this feature?
I can write off a Cheque directly to you.
Does this disabling parsing CSS really that complicated? Processing is CSS really not necessary and actually wasteful for a lot of use cases. It's been more that 3 years now since this issue has been raised! I wish it was resolved by now ☹
Same boat, I just need the HTML and something CSS-related is failing.
Same – And in my case, jsdom is used as a dependency from other libs, so the option to switch to another dom provider just to disable css parsing is not terribly viable.
For anyone else stumbling here, I created yet another fork with the no-CSS patches ontop of the latest JSDOM version: https://github.com/phgn0/jsdom-no-css.
Just use it as "jsdom": "phgn0/jsdom-no-css#master",
in your package.json (or fork it).
I currently use this to get around it:
new JSDOM(html.replace(/<style(\s|>).*?<\/style>/gi, ''))
Since there are already at least two forks with a fix, I assume it's not relatively difficult to implement, so what's stopping this from progressing @domenic?
Disclaimer: I've also just encountered this problem and in my use-case, I do not need CSS parsing.
Here's a workaround that uses monkey-patching. I think this is easier than keeping forks around:
/**
* Workaround for https://github.com/jsdom/jsdom/issues/2005
*/
export function disableCssProcessing() {
const HTMLStyleElementImpl =
require("jsdom/lib/jsdom/living/nodes/HTMLStyleElement-impl").implementation;
HTMLStyleElementImpl.prototype._updateAStyleBlock = () => {};
}
import version of previous patch:
import { implementation } from 'jsdom/lib/jsdom/living/nodes/HTMLStyleElement-impl.js';
implementation.prototype._updateAStyleBlock = () => {};
I currently use this to get around it:
new JSDOM(html.replace(/<style(\s|>).*?<\/style>/gi, ''))
It didn't work for me. Instead, I found another regex that works for both inline styles and scripts.
I now use the following code to remove inline CSS and scripts:
const sanitizeHtml = html => {
return html?.replace(/<style([\S\s]*?)>([\S\s]*?)<\/style>/gim, '')?.replace(/<script([\S\s]*?)>([\S\s]*?)<\/script>/gim, '')
}
let doc = new JSDOM(sanitizeHtml(html), { url })
Another reason to disable is for cases where JSDom is unable to parse the CSS. For example it doesn't seem to be able to handle CSS nesting.
Even if that support is added there will always be future changes to CSS that JSDOM will be catching up to. Providing the ability to disable that parsing when not needed allows that issue to be avoided.
Just ran into the same issue as above regarding CSS nesting support, but was able to apply the HTMLStyleElement
patch override.
Basic info:
Minimal reproduction case
If I try parse html contain
I get the following error:
The problem can be solved by specifying the parameter url.
However, I'm wondering if it's possible to completely disable the processing of CSS?