Open thisisparker opened 5 months ago
Looking into it: this appears to be operated by a company called Datadome and they're setting and checking a cookie called datadome
with a long token value. Theoretically we could provide that value with requests similar to an auth token, but I'd rather not have to do that. Still hoping this is temporary!
Maybe fixed this with #183, though I'm not thrilled with maintaining a list of random cookies that are required for each site and I don't know how long datadome
cookies last anyway. Leaving open for now :roll_eyes:
Unsurprisingly, datadome
tokens turn out to be very short-lived—on the order of hours, I guess? Maybe back to the drawing board here.
WSJ is returning 401/403 errors to requests from
requests
, includingxword-dl
. My guess is that this is in response to traffic patterns they're seeing and they will turn it off again in due course, but that's a waiting game.In the meantime, the error message should probably differentiate between this kind of connection error and a parsing error (which is what everything sounds like now).