Closed iboutillier closed 3 years ago
I'm having the same problem. When my crawler reaches a relative link, instead of going from http://example.com/index.html to http://example.com/about.html, it attempts to go to http://example.com/index.html/about.html, resulting in errors.
The crawler concat the child's uri relative to the parent : https://mysite/folder/page => found : /js/main.js https://mysite/folder/page/js/main.js
Same thing when a link doesn't have protocol declared : https://mysite/folder/page => found : //subdomain.mysite/images/myimage.png https://mysite/folder/page//subdomain.mysite/images/myimage.png
Install apt-get install python3 python3-lxml python3-requests apt-get install python3-pip python-pip pip3 install spidy-web-crawler
Starting spidy Web Crawler version 1.6.5
Am i the only one with this problem ?
Thx for you help