wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.43k stars 233 forks source link

A few problems with V7 for OS X #496

Closed WebFox64 closed 10 years ago

WebFox64 commented 10 years ago

Hi, I found today in download.com your LinkChecker 7.0 for OS X. What a surprise. One of the best tools I discovered so far. It helped me to eliminate a great number of errors I did not find by now... Anyway I found also a few not to say bugs but problems.

  1. It reports for www.hlebnoederevo.com: Protected by Copyscape Online Plagiarism Tool

an error which is not in above code to find:

URL Protected by Copyscape Plagiarism Checker - Do not copy content from this page.' (cached) NameProtected by Copyscape Online Plagiarism Tool' Parent URL http://www.hlebnoederevo.com, line 421, col 67 (HTML) (CSS) Real URL http://www.hlebnoederevo.com/Protected by Copyscape Plagiarism Checker - Do not copy content from this page. Result Error: 404 Not Found

Real URL is incorrect and therefore no error!

  1. You use my web server and page settings of utf-8 very correctly, besides that your tool does urlencode links! Since a while using per example Cyrillic link urls - good for readability by clients of links - is no problem for modern browsers, and also for most Search engines, even Google ranks those links well and correct, only Bing has a bit trouble but seems to start to fix it! Also, Yandex works well with it.

Your tools puts out:

URL Protected by Copyscape Plagiarism Checker - Do not copy content from this page.' (cached) NameProtected by Copyscape Online Plagiarism Tool' Parent URL http://www.hlebnoederevo.com/%D0%AD%D0%BC%D1%83%D0%BB%D1%8C%D1%81%D0%B8%D1%8F-%D1%81%D0%BC%D0%B0%D0%B7%D1%8B%D0%B2%D0%B0%D0%BD%D0%B8%D1%8F-%D1%85%D0%BB%D0%B5%D0%B1%D0%BE%D0%BF%D0%B5%D0%BA%D0%B0%D1%80%D0%BD%D1%8B%D1%85-%D1%84%D0%BE%D1%80%D0%BC.php, line 187, col 67 (HTML) (CSS) Real URL http://www.hlebnoederevo.com/Protected%20by%20Copyscape%20Plagiarism%20Checker%20-%20Do%20not%20copy%20content%20from%20this%20page. Result Error: 404 Not Found

and after I run the saves report through a rawurldecode with php I get an readable output. See below. You see it is not only a problem for Cyrillic. More helpful if any error output is readable and proper decoded.

URL Protected by Copyscape Plagiarism Checker - Do not copy content from this page.' (cached) NameProtected by Copyscape Online Plagiarism Tool' Parent URL http://www.hlebnoederevo.com/Автомат-упаковки-хлеба-клипсатор-хлеборезка-транспортер.php, line 248, col 67 (HTML) (CSS) Real URL http://www.hlebnoederevo.com/Protected by Copyscape Plagiarism Checker - Do not copy content from this page. Result Error: 404 Not Found

  1. and finally your tool seems to check mailto: links remotely, and this cause always errors. Firstly the servers we use do fast repeated request block if from the same origin and so it can only cause problems (errors). Would be better if this test could be disabled for remote verification.
wummel commented 10 years ago

Regarding the "Protected by Copyscape" link: you are using in your page. The contents of longdesc must be a URL pointing to text, not the text itself. Use instead.

The URL encoding: all non-ascii characters are URL-encoded. Right now I won't change this.

The mailto: links are not checked in the current 9.1 version - try it out!

WebFox64 commented 10 years ago

OK this longdesc is a relict I forgot to remove, will do it soon. Will try 9.1 and come back if necessary.

On Mon, Apr 7, 2014 at 10:33 PM, wummel notifications@github.com wrote:

Regarding the "Protected by Copyscape" link: you are using in your page. The contents of longdesc must be a URL pointing to text, not the text itself. Use instead.

The URL encoding: all non-ascii characters are URL-encoded. Right now I won't change this.

The mailto: links are not checked in the current 9.1 version - try it out!

Reply to this email directly or view it on GitHubhttps://github.com/wummel/linkchecker/issues/496#issuecomment-39766446 .

WebFox64 commented 10 years ago

Ups, tried to download 9.1 for OS X, but there is only an exe, and a deb version. But not any OS X version. Can you be please so kind, to give me a link so I can get it? Petr

wummel commented 10 years ago

There is no OSX binary since I do not have access to an OSX system (and Apple does not allow virtualization on non-apple hardware). See doc/install.txt on how to compile the binary yourself.