Restrict github.io whitelist to w3c.github.io

mdittmer / web-apis

Playground for better understanding Web APIs

Apache License 2.0

18 stars 10 forks source link

Restrict github.io whitelist to w3c.github.io #14

Open foolip opened 8 years ago

foolip commented 8 years ago

I think we should have to actively decide that others are OK, someperson.github.io could be a problem if it is intended to graduate from there.

mdittmer commented 7 years ago

We currently scrape meaningful data from:

[ 'heycam.github.io',
  'w3c.github.io',
  'webaudio.github.io',
  'webbluetoothcg.github.io',
  'wicg.github.io' ]

Do we really want to whitelist each one separately? Or do we just want to notice bad agithub.io matches and blacklist them?

foolip commented 7 years ago

I think that whitelisting separately is probably best for now, using cowboy.github.io and not noticing would be worse than having to whitelist a new URL I think.

We could of course devise a system where we scrape everything linked, and it's merely a warning when some URL isn't in our list of known spec patterns.