marksteward / earlbot

Simple IRC bot to spit out the title of URLs pasted into the channel
4 stars 2 forks source link

link rel=canonical not normalised #24

Open marksteward opened 9 years ago

marksteward commented 9 years ago

https://beta.companieshouse.gov.uk/company/06807563 and https://hanno.co/ both have a canonical link of "/", and this caused earlbot to think they're the same.

marksteward commented 8 years ago

Also happened with https://www.citizensadvice.org.uk and https://jsonresume.org