postlight / parser

📜 Extract meaningful content from the chaos of a web page
https://reader.postlight.com
Apache License 2.0
5.42k stars 445 forks source link

Parser sometimes timeouts when parsing mp.weixin.qq.com #709

Open BoWuGit opened 1 year ago

BoWuGit commented 1 year ago

Platform: CentOS and Mac. Mercury Parser Version: 2.3.0 Node Version: v16.13.2

Expected Behavior

It should extract the content.

Current Behavior

Actually it would block and not response, besides would cause CPU usage to be 100%.

Steps to Reproduce

Just use this link https://mp.weixin.qq.com/s/-KfcG9eYkBLD7DsmjYrdrQ to test, it's easy to reproduce, and not all urls of this domain fail. Thanks for your attention.