messede-degod / sstable-migrator

Generate SStables From CSV Or Json. The Data Loading Workhorse behind https://ip.thc.org
https://ip.thc.org
GNU General Public License v3.0
1 stars 0 forks source link

rank based output #4

Open SkyperTHC opened 1 month ago

SkyperTHC commented 1 month ago

Show the 'relevant' domains to the user first.

The tricky part is to determine what is relevant (and this needs discussing) (and all after the pattern/regex-filter was applied)

  1. Alexa ranking (??)
  2. Anything that contains .mil, .gov
  3. Anything that 'hackers' usually are interested in (e.g. "git.", "gw." "test.") and if there are multiple choices then exclude ".your-server.de" or DSL homelinks (e.g. give them least pagerank).
  4. If the USER is from .DE then prioritise .DE domains first (???)
  5. ...please add. This is more complex then you may think.
messede-degod commented 1 month ago
  1. ignore records for isp and cloud ip ranges. ex: ec2-54-254-114-60.ap-1.compute.amazonaws.com. or static-100-38-144-198.nycmny.fios.verizon.net, these have little to no value.

  2. build a domain ranking/scoring algorithm ourselves which takes the following into account: 1) tld popularity 2) tld avg cost 3) length of the domain name 4) the subdomain level ( ex: a.b.c.com - level 4) 5) whether a domain belongs to a cloud provider / free hosting provider 6) domain name entropy