spider-rs / spider

A web crawler and scraper for Rust
https://spider.cloud
MIT License
1.16k stars 100 forks source link

with_limit(1) does not work when "chrome" feature is enabled #201

Closed viktorholk closed 3 months ago

viktorholk commented 3 months ago

Hi

I have noticed a difference when using the with_limit configuration when switching between the chrome feature and default features.

Example

use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
   let target = "https://crawler-test.com/";

   for i in [1, 2] {
       let mut website: Website = Website::new(target).with_limit(i).build().unwrap();

       website.scrape().await;

       let pages = website.get_pages().unwrap();
       if pages.iter().count() == 0 {
           println!("with_limit({}) - No pages", i);
       }

       for page in pages.iter() {
           println!(
               "with_limit({}) - {:?} {:?}",
               i,
               page.get_url(),
               page.status_code
           );
       }
   }
}

With no features enabled i get this output:

with_limit(1) - "https://crawler-test.com/" 200
with_limit(2) - "https://crawler-test.com/" 200
with_limit(2) - "https://crawler-test.com/robots_protocol/allowed_shorter" 200

With chrome and chrome_cpu i get

with_limit(1) - No pages
with_limit(2) - "https://crawler-test.com/" 200
j-mendez commented 3 months ago

@viktorholk thank you for the issue. The scrape API is being removed in v2. Going to keep this up until.

j-mendez commented 3 months ago

Fixed in v2.0.0