pexcn / daily

Poetry and far afield. 🌊
GNU General Public License v3.0
584 stars 103 forks source link

add root domain version of gfwlist/chinalist #44

Closed pexcn closed 4 years ago

pexcn commented 4 years ago
pexcn commented 4 years ago

Required to match all TLDs, e.g.:

.com
.org
.co.jp
.co.uk
.com.hk
.space
.xn--q9jyb4c
.tw

References:

  1. https://raw.githubusercontent.com/publicsuffix/list/master/public_suffix_list.dat
  2. https://publicsuffix.org/list/effective_tld_names.dat
  3. https://data.iana.org/TLD/tlds-alpha-by-domain.txt
pexcn commented 4 years ago

Regex example:

# https://regex101.com/r/QHHAPR/2
([^\.]+)\.(com|org|co.uk|hk)$
pexcn commented 4 years ago

Code:

# maximum 1 dot allowed
grep -E '^[^\.]*\.?[^\.]*$' <file>

# must have only 1 dot
grep -E '^[^\.]*\.[^\.]*$' <file>

References:

  1. https://stackoverflow.com/questions/569137/how-to-get-domain-name-from-url
  2. https://stackoverflow.com/questions/14460680/how-to-get-a-list-of-tlds-using-bash-for-building-a-regex
  3. https://stackoverflow.com/questions/28999021/how-to-allow-to-only-one-dot
pexcn commented 4 years ago

Related: https://github.com/pexcn/daily/issues/45