j0k3r / graby

Graby helps you extract article content from web pages
MIT License
362 stars 73 forks source link

Improve regex used to remove conditional comments in HTML #336

Closed Kdecherf closed 9 months ago

Kdecherf commented 11 months ago

From the test added by this commit, the previous regex incorrectly removed the shown.jpg image. The new regex now keeps what would be normally shown by a browser.

It should be noted that, according to regex101.com this instruction comes at a cost three times higher that before (~1,200 iterations in place of ~400)

Fixes https://github.com/wallabag/wallabag/issues/6828

This must be backported into release/2.x for wallabag