wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.42k stars 234 forks source link

Duplicate checks? #608

Open jimpriest opened 9 years ago

jimpriest commented 9 years ago

Same link, parent and timestamp? Why is this recorded more than once? In this instance it was listed 12 times?

Looking in my csv file I see:

7/21/2015  //www.sameurl.net http://www.parenturl.com/samepage 403 Forbidden FALSE -1  0.9819278717
7/21/2015  //www.sameurl.net http://www.parenturl.com/samepage 403 Forbidden FALSE -1    0.9819278717
7/21/2015  //www.sameurl.net http://www.parenturl.com/samepage 403 Forbidden FALSE -1  0.9819278717
AlexAndrascu commented 8 years ago

+1 ?

PetrDlouhy commented 7 years ago

This is very annoying, since it makes outputs quite chaotic. Also it raises suspicion, that Linkchecker does check every link more than once, which could slow it down and generate unnecessary load.

PetrDlouhy commented 7 years ago

There are two reasons for this - one is described under PR #687, other is that cache is restricted to 100 000 items for memory usage reasons and this can't be changed from command line.

dpalic commented 7 years ago

Thank you for the issue report. Sadly this project is dead, and a new team is around with https://github.com/linkcheck/linkchecker for more details please see: #708 Also please close this issue and report it freshly on the new repo https://github.com/linkcheck/linkchecker/issues if your issue still persists