lycheeverse / lychee

⚡ Fast, async, stream-based link checker written in Rust. Finds broken URLs and mail addresses inside Markdown, HTML, reStructuredText, websites and more!
https://lychee.cli.rs
Apache License 2.0
1.86k stars 114 forks source link

Feature request: add FlareSolverr solver to bypass CloudFlare protection #1439

Open soredake opened 3 weeks ago

soredake commented 3 weeks ago

Ther are a lot of sites that use cloudflare as protection, checking sites that use cloudflare can result in 403, would be nice to have support for FlareSolverr to fix this situation.

Related: https://github.com/lycheeverse/lychee/issues/1157 https://github.com/jobobby04/TachiyomiSY/pull/1124

mre commented 3 weeks ago

First time I hear of FlareSolverr, but it looks nice. We wanted to add proxy support anyway at some point. There's a proxy config file that browsers like Firefox use. Not sure if it could also be used to call FlareSolverr.

mre commented 3 weeks ago

The proxy support request I was referring to: https://github.com/lycheeverse/lychee/issues/869 Would it "just work" if we added support for that?

mre commented 3 weeks ago

Looks like FlareSolverr requires data to be sent as a JSON POST request? So the proxy approach wouldn't work. There's a prototype for WebAssembly support, which could be used to modify the request. @thomas-zahner, this should work, right?

thomas-zahner commented 2 weeks ago

Yes this seems like a perfect use case of the recently implemented RequestChain. The WebAssembly part is indeed only a prototype and not merged yet. However, the request chain itself is already fully functional and available in lychee_lib through the ClientBuilder.

An example on how to make use of the RequestChain can be found here: https://github.com/lycheeverse/lychee/blob/master/examples/chain/chain.rs