Closed jmikedupont2 closed 2 months ago
I would like to spider starting with a local file of urls and then have it follow them, I found multiple hard coded parts of the program that have https checks, seems like duplicated code. please refactor and allow for more protocols.
Hi, can you show an example of this not working? That way someone does not have to figure out and look at what refactoring looks like.
Using this file https://github.com/meta-introspector/time-grants/blob/main/2024/08/01/links.html
RUST_LOG=debug ./target/debug/spider --verbose --url file:///home/mdupont/2024/08/01/time-grants/2024/08/01/links.html crawl
Using this file
https://github.com/meta-introspector/time-grants/blob/main/2024/08/01/links.html
RUST_LOG=debug ./target/debug/spider --verbose --url file:///home/mdupont/2024/08/01/time-grants/2024/08/01/links.html crawl
Whoever handles this needs to make it a config. Usually we want to ignore local files unless we know beforehand that we need to crawl some files on our disk machine.
The selectors should be appropriate for the links. The issue is you cannot get local files when crawling remotely.
I would like to spider starting with a local file of urls and then have it follow them, I found multiple hard coded parts of the program that have https checks, seems like duplicated code. please refactor and allow for more protocols.