jacktuck / unfurl

Metadata scraper with support for oEmbed, Twitter Cards and Open Graph Protocol for Node.js :zap:
MIT License
481 stars 52 forks source link

Conditional early termination of html parsing #68

Closed trieloff closed 3 years ago

trieloff commented 3 years ago

This is a fix for #67

It will only stop parsing after seeing the closing head tag if a title has been found, otherwise it assumes the HTML has been written with less care and as a consequence, the entire document will be scanned for metadata clues.

coveralls commented 3 years ago

Coverage Status

Coverage decreased (-0.6%) to 97.846% when pulling d28954cf3a47ebb3c7bfb0c88ac1d590931dcf1b on trieloff:master into db57429b369bae7e22f6983a7e19832c54101491 on jacktuck:master.

coveralls commented 3 years ago

Coverage Status

Coverage decreased (-0.6%) to 97.846% when pulling d28954cf3a47ebb3c7bfb0c88ac1d590931dcf1b on trieloff:master into db57429b369bae7e22f6983a7e19832c54101491 on jacktuck:master.

coveralls commented 3 years ago

Coverage Status

Coverage decreased (-0.6%) to 97.846% when pulling d28954cf3a47ebb3c7bfb0c88ac1d590931dcf1b on trieloff:master into db57429b369bae7e22f6983a7e19832c54101491 on jacktuck:master.

jacktuck commented 3 years ago

Looks good to me 👍 , thanks. Did you see this much in the wild?

jacktuck commented 3 years ago

going to take the opportunity to add semantic release before merging this, will look at that now.

trieloff commented 3 years ago

This issue kept the https://github.com/adobe/helix-embed/pull/345 build failing for a month. Not something that affected production because we use an additional oembed provider, but definitely annoying – and youtube is probably one of the top three embedding targets.

github-actions[bot] commented 3 years ago

:tada: This PR is included in version 5.2.1 :tada:

The release is available on:

Your semantic-release bot :package::rocket: