Closed vvdwivedi closed 4 years ago
Is it solved with the master branch (unreleased v0.8)?
Haven't tried with master branch, just the released version. Will test with master branch and confirm by today.
Here is what I tested. I built from master branch and ran
bin/blc https://pg.vvdwivedi.com/broken-links.html
Here is the relevant source of page for quick reference
`
` This is the result for the run: Getting links from: https://pg.vvdwivedi.com/broken-links.html ├───OK─── https://pg.vvdwivedi.com/index.html ├───OK─── https://pg.vvdwivedi.com/img/small-img.png ├───OK─── https://pg.vvdwivedi.com/files/a.txt ├───OK─── https://www.google.com/ ├─BROKEN─ https://pg.vvdwivedi.com/index-broken.html (HTTP_404) ├─BROKEN─ https://pg.vvdwivedi.com/img/small-img2.png (HTTP_404) ├─BROKEN─ https://pg.vvdwivedi.com/files/ab.txt (HTTP_404) ├─BROKEN─ https://pg.vvdwivedi.com/www.google.com (HTTP_404) ======================= Links found: 16 Links skipped: 5 Links OK: 7 Links broken: 4 Time elapsed: 0 seconds ======================= I can see that it's considering the url starting with `//` as relative and appending the host, which results in a 404.//www.google.com
→ https://www.google.com/
OK
www.google.com
→ https://pg.vvdwivedi.com/www.google.com
404
Looks fine to me.
Yes, you are right. Got a little confused there. The new version is fine.
After a quick run on 0.7.8
and 0.8.0
, got following results:
From v 0.8.0 `Getting links from: https://pg.vvdwivedi.com/broken-links.html ├───OK─── https://pg.vvdwivedi.com/index.html ├───OK─── https://pg.vvdwivedi.com/img/small-img.png ├───OK─── https://pg.vvdwivedi.com/files/a.txt ├───OK─── https://www.google.com/ ├─BROKEN─ https://pg.vvdwivedi.com/index-broken.html (HTTP_404) ├─BROKEN─ https://pg.vvdwivedi.com/img/small-img2.png (HTTP_404) ├─BROKEN─ https://pg.vvdwivedi.com/files/ab.txt (HTTP_404) ├─BROKEN─ https://pg.vvdwivedi.com/www.google.com (HTTP_404)
======================= Links found: 16 Links skipped: 5 Links OK: 7 Links broken: 4 Time elapsed: 0 seconds =======================`
From v 0.7.8
Getting links from: https://pg.vvdwivedi.com/broken-links.html ├───OK─── https://pg.vvdwivedi.com/index.html ├───OK─── https://pg.vvdwivedi.com/img/small-img.png ├───OK─── https://pg.vvdwivedi.com/files/a.txt ├───OK─── https://www.google.com/ ├─BROKEN─ https://pg.vvdwivedi.com/index-broken.html (HTTP_404) ├─BROKEN─ https://pg.vvdwivedi.com/img/small-img2.png (HTTP_404) ├─BROKEN─ https://pg.vvdwivedi.com/files/ab.txt (HTTP_404)
I am getting a lot of URLs like these reported as broken:
But these seem to be valid URLs when rendered in the page and nothing is broken because of that. I am not sure of the exact technical term, but I think these are scheme relative url strings
https://url.spec.whatwg.org/#scheme-relative-url-string
I noticed that you have a package
isurl
and there is a lenient way of checking for valid URLs. Not sure if these URLs are reported broken because of the test by isurl, but if yes, can we add an option there to allow such urls?Environment: