Open Nashorn opened 4 years ago
The solution in the PR will perform a slightly different stripping, but maintain DOM tree:
Preserving the script tag to maintain DOM tree is important.
Hi @Nashorn, thanks for making this change! I'll take a look at it in a couple days.
Remove 'src' attributes from 'script' tags only. Removing script tag from
alters the DOM tree and css nth-node operations, because it no longer matches the original html document.For example, in screen scraping where the captured HTML doc needs to match the DOM structure of live site so that the same CSS rules that use nth-level selectors (i.e.: main > div:nth-child(2) > div > div > div:nth-child(3) > ...). or even Javascript querySelector should work both on live DOM as well as scraped copy.
Removing all script tags, even from
changes the tree, the above ex selector breaks with null.FIX:: 2nd, addresses a bug where