spider-rs / spider

A web crawler and scraper for Rust
https://spider.cloud
MIT License
1.16k stars 100 forks source link

Add CSS exclude selectors to /crawl endpoint #219

Closed mikoro closed 1 month ago

mikoro commented 1 month ago

Add possibility to exclude content on the page based on CSS selectors.

boehlerlukas commented 1 month ago

@mikoro we are looking for the same at the moment. Spider looks really promising for us at Gleap - excited to see it perform in production.

j-mendez commented 1 month ago

released for transformations v2.9.15

boehlerlukas commented 1 month ago

Awesome, thanks a lot! @j-mendez is this also available through the /crawl api of https://spider.cloud/ ? We prefer the cloud over self hosting :)

j-mendez commented 1 month ago

Awesome, thanks a lot! @j-mendez is this also available through the /crawl api of https://spider.cloud/ ? We prefer the cloud over self hosting :)

No problem, yes this is now available on the cloud too.

boehlerlukas commented 1 month ago

Awesome! I think the docs haven't been updated yet - could you share a basic example on how to use them here? That would be incredible. Thanks a lot!

boehlerlukas commented 1 month ago

@j-mendez any push into the right direction would be much appreciated as we are currently lookin for a better performing crawler than crawlee.

j-mendez commented 1 month ago

@j-mendez any push into the right direction would be much appreciated as we are currently lookin for a better performing crawler than crawlee.

released in v2.10.0 - deploy will be out in 30 mins. Use exclude_selector to remove the elements from the markup. https://github.com/spider-rs/spider/commit/2065600f2e81c76bfb4ef7cea594c45716baeed8

boehlerlukas commented 1 month ago

@j-mendez awesome, this was super fast!