Closed kevincox closed 2 years ago
I do know Reddit does some categorizing based on several details about headers to determine if it's a client to let in, or a bot/script to block.
One difference I see is that curl is using HTTP2. You could either force curl to use HTTP1, or disable reqwest's usage of it's default-tls and instead enable rustls-tls, which has ALPN support and will get it talking h2.
I tried forcing curl to use 1.1 and 1.0 and it still succeeded. I honestly can't figure out the difference between the requests except for maybe header order? (But I can't see the header order of reqwest.) I'll give rustls-tls a try and see if it helps.
Strange. Enabling "native-tls-alpn" for HTTP2 didn't work but using rustls worked.
reqwest doesn't know how to use native-tls' ALPN support.
Well I'm not sure what I am doing wrong but my NGINX logs are saying protocol: HTTP/2.0 with use_native_tls()
and features = ["native-tls-alpn", "trust-dns"]
🤷
I also came across this issue in a service I run. Upgrading from 0.10 to 0.11 and using rustls-tls seemed to have fixed the issue and I'm no longer receiving 429s from Reddit.
reqwest = { version = "0.11", default-features = false, features = ["rustls-tls"] }
This is how I'm installing reqwest
in my Cargo.toml- hopefully copying and pasting this might help someone else who is stuck with the same issue.
Strange. Enabling "native-tls-alpn" for HTTP2 didn't work but using rustls worked.
@kevincox your code works for me if I enable native-tls-alpn
:
#[tokio::main]
async fn main() {
let client = reqwest::Client::builder()
.user_agent("some-unique-app.kevincox.ca/1")
.connect_timeout(std::time::Duration::from_secs(60))
.timeout(std::time::Duration::from_secs(600))
.build().unwrap();
let res = client
.get("https://www.reddit.com")
.header("accept", "*/*")
.send().await.unwrap();
dbg!(res.status());
dbg!(res.headers());
dbg!(res.text().await);
}
Cargo.toml
[package]
name = "temp-reddit"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
reqwest = { version = "0.11.5", features = ["native-tls-alpn"] }
tokio = { version = "1.12.0", features = ["rt", "rt-multi-thread", "macros"] }
$ cargo run
[src/main.rs:17] res.status() = 200
[src/main.rs:18] res.headers() = {
"cache-control": "private, s-maxage=0, max-age=0, must-revalidate, no-store",
"content-type": "text/html; charset=utf-8",
"accept-ranges": "bytes",
"date": "Sun, 24 Oct 2021 11:21:08 GMT",
"via": "1.1 varnish",
"vary": "Accept-Encoding, Accept-Encoding",
"set-cookie": "loid=0000000000ftjg9803.2.1635074468000.Z0FBQUFBQmhkVUdrblhDODlFcW0zT0FqV2RfVUdhY2ZBb1Y2S0pvNFJFeXRTbDloRFhKcWZoNW1uY3JEc0lUUTVuSVkxU2swZUV1RzVXUjUtSVNuVFhYeDBWUllacGVTMUJYME82RzJJcnRfR25lbEx5NHZIZ0JheFJFZzB1d3d2M0tSUGZTTU5ZY0w; path=/; expires=Tue, 24 Oct 2023 11:21:08 GMT; domain=.reddit.com; secure; SameSite=None; Secure",
"set-cookie": "session_tracker=opnjeprrqggamfbfam.0.1635074468413.Z0FBQUFBQmhkVUdraVdZSGZNRl9tTlRzTFFGWDNpLWMzbHlYQTk5NlVfeUUxdFF4VWlfN2l0eC0tNUVqT0RJcEMyZWxzZ3lCel9IUHdPR1pZM3NjQXpSZVVIcEg0NnREb2lyYjJ2ODlmbUxUakFXa3ZnZGJfRUdraldjUFk2eS1xNXMxTFFHdnlKODk; path=/; domain=.reddit.com; secure; SameSite=None; Secure",
"set-cookie": "token_v2=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE2MzUwNzc5NDgsInN1YiI6Ii16RHU5UUloY2Y3RVRRMUM3TmlQT0QwWGUwWXM4VXciLCJsb2dnZWRJbiI6ZmFsc2UsInNjb3BlcyI6WyIqIiwiZW1haWwiLCJwaWkiXX0.mEdtl8jJnJCeP-w9-SQNXwUCfrognWC3MfNDQ-Qp2bI; Path=/; Domain=reddit.com; Expires=Tue, 24 Oct 2023 11:21:08 GMT; HttpOnly; Secure",
"set-cookie": "csv=1; Max-Age=63072000; Domain=.reddit.com; Path=/; Secure; SameSite=None",
"set-cookie": "edgebucket=tw6VtRFWwR6TEcx7oW; Domain=reddit.com; Max-Age=63071999; Path=/; secure",
"strict-transport-security": "max-age=31536000; includeSubdomains",
"x-content-type-options": "nosniff",
"x-frame-options": "SAMEORIGIN",
"x-xss-protection": "1; mode=block",
"server": "snooserv",
"x-clacks-overhead": "GNU Terry Pratchett",
}
[src/main.rs:19] res.text().await = Ok(
"\n <!DOCTYPE html>\n <html lang=\"en-US\">\n <head>\n <script>\n var __SUPPORTS_TIMING_API = typeof performance === 'object' && !!performance.mark && !! performance.measure && !!performance.getEntriesByType;\n function __perfMark(name) { __SUPPORTS_TIMING_API && performance.mark(name); };\n var __firstPostLoaded = false;\n function __markFirstPostVisible() {\n
// further output omitted
Hmm, I thought that wasn't working before but it is indeed working for me now too. This also makes the requests use HTTP/2 based on logs from my nginx.
I understand that this likely isn't a reqwest bug, and there may be nothing to be done on the reqwest side, but I figured at the very least it would be useful to have this show up if someone searches the issue tracker. Feel free to close if you don't want to take any action or track this problem.
Reddit denies all requests with 429. I'm fairly sure this isn't actual rate limiting unless we are falling into a strange bucket that is always empty. The error message mentions my IP but requests from curl from the same IP with the same headers (as far as I can tell) succeed. I'm not sure how Reddit knows that the request is comping from reqwest, curl requests with the same settings succeed. I've having trouble nailing down how Reddit identifies reqwest but can't seem to get low level detail from reqwest and TLS makes it difficult to just use strace.
Cargo.toml
```toml [package] name = "tests" version = "0.1.0" edition = "2018" [dependencies] reqwest = "0.11.5" tokio = { version = "1.12.0", features = ["rt", "rt-multi-thread", "macros"] } ```Curl works.
I've tried multiple IP addresses and tried to match curl headers. However it seems like Reddit can always tell them apart and denies requests from reqwest.