Closed soodoku closed 7 years ago
When producing domain level category, ignore URLs which are of the form http://domain/path/ and only keep http://domain
For instance,
http://www.standaard.be/Artikel/Detail.aspx?artikelId=DMF02092008_138
yields Category: World / Nederlands / Computers / Software/ Internet / Browsers / Google Chrome which is category of the article, not the domain.
To get the category of the domain: http://www.standaard.be/
look for http://www.standaard.be/ which gives the right category: Category: World / Nederlands / Nieuws en Media / Dag- en Nieuwsbladen / België
We do this sensible thing for subdomains but not domains.
When producing domain level category, ignore URLs which are of the form http://domain/path/ and only keep http://domain
For instance,
http://www.standaard.be/Artikel/Detail.aspx?artikelId=DMF02092008_138
yields Category: World / Nederlands / Computers / Software/ Internet / Browsers / Google Chrome which is category of the article, not the domain.
To get the category of the domain: http://www.standaard.be/
look for http://www.standaard.be/ which gives the right category: Category: World / Nederlands / Nieuws en Media / Dag- en Nieuwsbladen / België
We do this sensible thing for subdomains but not domains.