Open foolip opened 8 years ago
We currently scrape meaningful data from:
[ 'heycam.github.io',
'w3c.github.io',
'webaudio.github.io',
'webbluetoothcg.github.io',
'wicg.github.io' ]
Do we really want to whitelist each one separately? Or do we just want to notice bad agithub.io
matches and blacklist them?
I think that whitelisting separately is probably best for now, using cowboy.github.io and not noticing would be worse than having to whitelist a new URL I think.
We could of course devise a system where we scrape everything linked, and it's merely a warning when some URL isn't in our list of known spec patterns.
I think we should have to actively decide that others are OK, someperson.github.io could be a problem if it is intended to graduate from there.