Open thisisparker opened 10 months ago
Looking into it: this appears to be operated by a company called Datadome and they're setting and checking a cookie called datadome
with a long token value. Theoretically we could provide that value with requests similar to an auth token, but I'd rather not have to do that. Still hoping this is temporary!
Maybe fixed this with #183, though I'm not thrilled with maintaining a list of random cookies that are required for each site and I don't know how long datadome
cookies last anyway. Leaving open for now :roll_eyes:
Unsurprisingly, datadome
tokens turn out to be very short-lived—on the order of hours, I guess? Maybe back to the drawing board here.
You don't want to just pull from Martin Herbach's site? http://herbach.dnsalias.com/wsj/wsj240720.puz
Nope, not in xword-dl
itself. Obviously that's a good option for end users who want it, but I've made the design decision that this tool only uses first-party sources and does its own scraping and parsing.
WSJ is returning 401/403 errors to requests from
requests
, includingxword-dl
. My guess is that this is in response to traffic patterns they're seeing and they will turn it off again in due course, but that's a waiting game.In the meantime, the error message should probably differentiate between this kind of connection error and a parsing error (which is what everything sounds like now).