wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.42k stars 234 forks source link

linkchecker climbs up path #761

Closed admin60 closed 5 years ago

admin60 commented 5 years ago

I want to check only a part of my webserver, i.e.:

http://some.domain/first/second

I only want to get checked recursively everything below /first/second

But linkchecker climbs up this given path and checks parts of the server I don't want to be checked.

How to prevent linkchecker from climbing up the given path?

I already tried --ignore-url with a negative look ahead regular expression: --ignore-url 'some-domain.*(?!second).*'

Now it does not climbs up the path above "second" but also does not climb the path down below second what it should.

admin60 commented 5 years ago

I found a working solution with a regexp with negative look ahead:

--ignore-url 'www\.some\.domain\/(?!level1\/level2\/[.]*)'

when I want to check only everything below http://www.some.domain/level1/level2.

It would be much easier to achieve this with one parameter (i.e. -ncu for 'not climb up') to add which internally sets the according regexp for the given URL to prevent it from climbing up the given path.

dpalic commented 5 years ago

Thank you for the issue report. Sadly this project is dead, and a new team is around with https://github.com/linkcheck/linkchecker for more details please see: #708 Also please close this issue and report it freshly on the new repo https://github.com/linkcheck/linkchecker/issues