Closed DimitriTimoz closed 2 months ago
Using the await on the join_handle
with the subscription is incorrect. It needs to drop on it's own. We have a subscription guard for the chrome as needed. Take a look at the examples repo to learn more.
let mut website: Website = Website::new("https://rsseau.fr"); let mut rx2: tokio::sync::broadcast::Receiver<spider::page::Page> = website.subscribe(0).unwrap(); website.with_limit(1); let join_handle = tokio::spawn(async move { while let Ok(res) = rx2.recv().await { println!("page"); } }); website.scrape().await; // End with crawling website.unsubscribe(); join_handle.await.unwrap();
Using the subscription should mainly be with crawling. When scraping data you are holding the content in memory It would be beneficial to pick one not both.
The following code never end when spider is in scraping mode, no new pages, only a locking on
.recv()