mzhilyaev / sim-sites-download

3 stars 1 forks source link

What should we do with duplicate titles? #2

Open mzhilyaev opened 10 years ago

mzhilyaev commented 10 years ago

A number of titles are duplicate because their sites are different domains for the same site, like: "title": "Google" will have all of these sites: "site": "google.co.uk", "site": "googleapis.com", "site": "google.com.co", "site": "rtbpop.com", "site": "googlesyndication.com", "site": "localhost.com", "site": "google.co.nz", "site": "blogspot.co.nz", "site": "adwords-community.com", "site": "google.com.au", "site": "google.com", "site": "bestadbid.com", "site": "googleadservices.com",

I suggest only keeping ONE unique title and taking the domain with the highest rank

Mardak, please comment and re-assign back to me.

Mardak commented 10 years ago

How many duplicate titles are there for a given list of top domains for a country? Something with the same title doesn't mean it's the same site. At least these all seem to redirect to google.com, but that might be some IP address detection and only happens in US.

mzhilyaev commented 10 years ago

Perhaps, we should still remove it, so the user does not get confused when it sees two site recommendations with the same title... Since, we have plenty of sites, removing identically titled sites seemed like a good way to avoid potential confusion

Mardak commented 10 years ago

Sure, using the highest rank site for a given title should be fine for now