mozilla / page-metadata-parser

DEPRECATED - A Javascript library for parsing metadata on a web page.
https://www.npmjs.com/package/page-metadata-parser
Mozilla Public License 2.0
270 stars 42 forks source link

Bug 1435308 — Improve canonical link detection for AMP pages. #99

Closed jhugman closed 6 years ago

jhugman commented 6 years ago

This PR adds a rule to the url ruleset to improve the retrieval of the correct canonical link.

From our tests of AMP pages hosted on the google.com domain, the rel url provided by pages have often pointed to the AMP version of the page, but hosted on publications' own domain. This still are AMP page.

The additional rule retrieves the desktop version of the page.

Link to Bug 1435308.

jaredlockhart commented 6 years ago

@jhugman Thanks for adding this, please feel free to submit more rules! Please add a test case here:

https://github.com/mozilla/page-metadata-parser/blob/master/tests/metadataRules.test.js#L48

r+wc

jhugman commented 6 years ago

Test added. Ready to merge.

/cc @jaredkerim

jaredlockhart commented 6 years ago

@jhugman Merged and retagged 1.1.1 and npm published