spider-rs / spider

A web crawler and scraper for Rust
https://spider.cloud
MIT License
1.11k stars 96 forks source link

Changed the callback functionality #205

Closed Rushmore75 closed 2 months ago

Rushmore75 commented 2 months ago

I changed the callback functionality to produce the url of the site it just crawled and the site it found.

It was useful to me, it would probably be a breaking change, but it might be useful to someone else 🤷

j-mendez commented 2 months ago

I changed the callback functionality to produce the url of the site it just crawled and the site it found.

It was useful to me, it would probably be a breaking change, but it might be useful to someone else 🤷

What are you using the callback for? There's a subscription option to get the full page with details that you can pass the domain prior.

Rushmore75 commented 2 months ago

Oh, sounds like that's what I wanted! The lack of written documentation makes it a smidge hard. The documentation is present, perhaps some examples output is what I was in search of

Basically what I want is just the url of the page that was just crawled, then the url it just found. (I'm using it to map the links on my websites). Such as /index.html -> /blog/interesting-article.html

j-mendez commented 2 months ago

Oh, sounds like that's what I wanted! ~The lack of written documentation makes it a smidge hard.~ The documentation is present, perhaps some examples output is what I was in search of

Basically what I want is just the url of the page that was just crawled, then the url it just found. (I'm using it to map the links on my websites). Such as /index.html -> /blog/interesting-article.html

https://github.com/spider-rs/spider/blob/main/examples/subscribe.rs pass the domain in the subscription, feel free to post any issues or put up PRs again etc.