mozilla / page-metadata-parser

DEPRECATED - A Javascript library for parsing metadata on a web page.
https://www.npmjs.com/package/page-metadata-parser
Mozilla Public License 2.0
270 stars 42 forks source link

Only return preview images #17

Closed jaredlockhart closed 7 years ago

jaredlockhart commented 8 years ago

On some pages, such as https://cbc.ca the parser will return a tracking image. We should remove the rule for previewImage.

pdehaan commented 8 years ago

Same with www.cnn.com. Currently it's returning http://i.cdn.turner.com/ttn/ttn_adspaces/1.0/creatives/2012/2/10/1529111x1.gif

Embed.ly ranks the images by dimensions/complexity, I think.

pdehaan commented 8 years ago

Actually, another time, www.cnn.com is giving me this image http://i.cdn.turner.com/cnn/.e1mo/img/4.0/logos/menu_politics.png which I think is a navigation link, not a 1x1 px tracker.


FIGURE 1: It looks like a transparent PNG with white text (which renders poorly above).


jaredlockhart commented 7 years ago

We should just only look at the content of preview images and fail to return an image otherwise.