mozilla / tippy-top-sites-deprecated

[deprecated][unmaintained]
7 stars 7 forks source link

A few invalid background colors #6

Closed pdehaan closed 8 years ago

pdehaan commented 8 years ago

Randomly spotted. Not sure why yet, maybe Embedly is returning bad data, or we're doing something wrong in our scraper.

SITE BACKGROUND COLOR IS VALID?
About null false
Amazon Web Services null false
American Express #08DFA false
AOL #0ADF5 false
Bing #08484 false
Business Insider #05D83 false
Chase #05BB1 false
CNET #A0B8 false
CNN #C1118 false
Diply.com #067B3 false
Dropbox #079EB false
eBay #066D3 false
Etsy #F265B false
Fox News #C3B55 false
Groupon null false
Kohl's #F365E false
LinkedIn #08EBE false
Live #78BBC false
Mail Online #04EB8 false
Microsoft #78BBC false
Microsoft Office #FF3D false
Microsoftonline #78BBC false
PornHub #F29BD false
Target #C1254 false
The Weather Channel #03A9D false
UPS #1D66 false
USPS #46CA7 false
WordPress.com #0A1D5 false
Yahoo #3BD9B false
Zillow #073E2 false

var data = require('./top_sites.json');

data.map((site) => {
  site.valid_background_color = !!(site.background_color && (site.background_color.length === 4 || site.background_color.length === 7));
  return site;
}).filter((site) => !site.valid_background_color)
.sort((siteA, siteB) => {
  var titleA = siteA.title.toLowerCase();
  var titleB = siteB.title.toLowerCase();

  if (titleA > titleB) {
    return 1;
  }
  if (titleA < titleB) {
    return -1;
  }
  return 0;
})
.forEach((site) => {
  console.log('| %s | %s | %s', site.title, site.background_color, site.valid_background_color);
});
pdehaan commented 8 years ago

Looks like maybe just some bad zero padding.

  {
    "title": "eBay",
    "url": "http://www.ebay.com",
    "image_url": "images/ebay.com.png",
    "background_color": "#066D3"
  },

066D3 is an invalid RGB value because it is only 5 chars.

https://embedly-proxy.dev.mozaws.net/extract?urls=https://www.ebay.com says the primary favicon color is [0, 102, 211] which (according to http://www.rgbtohex.net/) is #0066D3.

So I'm guessing when we ran the embedly scraper against the URLs, we converted the "00" to "0" when converting to/from numbers and strings.

nchapman commented 8 years ago

It was bad zero padding on my part :sweat_smile:. Thankfully you saved the day and fixed this in #8.