Closed PtrMan closed 3 years ago
Hi @PtrMan, can you provide some more details as to how you were using the crawler?
If you tried to cat a link into crawler_todo.txt
before running the crawler and didn't direct the crawler to that it might have overwritten it. If you started the crawler and put the link in the file after it had run for a bit it's likely that there were other links and the crawler never got to yours. Also, can you please provide the error that you encountered?
Hmm it seems to be a problem with the robots.txt parser.
[10:45:48] [reppy] [WORKER #0] [ROBOTS] [INFO]: Reading robots.txt file at: /robots.txth/robots.txtt/robots.txtt/robots.txtp/robots.txts/robots.txt:/robots.txt//robots.txt//robots.txtg/robots.txto/robots.txtl/robots.txte/robots.txtm/robots.txt./robots.txtd/robots.txte/robots.txt
It should just find 'https://golem.de/robots.txt'.
I'll have to look into this.
Resolved by #77
Expected Behavior
crawl like hell
Actual Behavior
dies of an unknown error
Steps to Reproduce the Problem
echo "https://www.golem.de/"> ./crawler_todo.txt spidy
What I've tried so far:
Using spidy