postlight / parser

📜 Extract meaningful content from the chaos of a web page
https://reader.postlight.com
Apache License 2.0
5.41k stars 442 forks source link

`og:` tags not properly parsed #593

Open TLadd opened 3 years ago

TLadd commented 3 years ago

Expected Behavior

og:title, og:description, og:image etc values ought to be used.

Current Behavior

These values are ignored because the generic extractors use extractFromMeta which ends up using selectors like meta[name="og:image"] but it really should be meta[property="og:image"].

Steps to Reproduce

Run MercuryParser.parse on a page that supplies og: tags and doesn't supply identical values via other means. Observe that the og: values are not present in the result.

Possible Solution

Either fix the extractFromMeta utility to special case og: tags or stop using it for these tags.