Closed szepeviktor closed 6 months ago
Entirely bit by bit currently. I believe @nickspaargaren first started the list by scrapping google domains, but we currently don't have any "automatic scrapping searches" or searching for new domains for all G products. Same goes for our No-A list (and potentially other ones in the future). I do believe an "automatic recipe" could be made in order to help maintain the whole list and keep it regularly updated with ease.
Thank you for your answer.
i also think the list who don't use wildcard must be used in last step too (because for instance metrics) it's easy to google to auto generate news subdomain for metrics, and it take some time to update the list. So mark the wildcard who block "metrics.google.com" and all it's subdomain instead of the list who list all "xxxxx.metrics.google.com" as recommended list can be a solution. What it can be useful to is to use a crawler who follow all link on google domain to generate a list of domain come from google website, and after check it's list to see if it's a google domain or not.
Is there a sound theory to create these lists or is it a bit-by-bit job?
Thank you!