thisisparker / xword-dl

⬛⬜⬛ Command line tool to scrape crosswords from online solvers and save them as .puz files ⬛⬜⬛
MIT License
145 stars 31 forks source link

Need less brittle NYT authentication method #58

Open thisisparker opened 1 year ago

thisisparker commented 1 year ago

Currently, xword-dl kind of spoofs the NYT Crosswords app with a request that passes a user-supplied username and password and then saves the authenticated NYT-S token it gets in response.[^1] That appears to work inconsistently, resulting in occasional 403 errors that have been difficult to diagnose and mitigate (e.g. #51). Given that this approach is unreliable now and subject to become even more reliable, it would be good if xword-dl had a more straightforward approach that was less subject to breakage.

One option I'm considering is using browser-cookie3, which could retrieve a cookie from basically any browser's storage without interaction. It seems like there might be complications with casting a very wide net here, like a user might be logged in to different accounts on different browsers, or use some kind of custom profile that the library can't figure out, and debugging that might be a bit of a chore.

I could imagine combining it with a webbrowser approach, where --authenticate would

  1. get a webbrowser.controller object, which I think would expose to me what the user's default is
  2. open a browser tab in their new window pointed to e.g. https://nytimes.com/login
  3. [... somehow determine that the user was done with login ...]
  4. use browser-cookie3 to check the nytimes.com specific cookies for that specific browser

That seems a little less invasive and could even be reused for other sites that require authentication. But I'm not sure if there's something I'm overlooking! Welcoming feedback here before I start trying to implement anything.

[^1]: The NYT-S token is just saved in the config file for subsequent requests, so any manual approach to copy a valid token into that file will also work just as well.

edsantiago commented 1 year ago

In case it helps: https://github.com/Q726kbXuN/nytxw_puz

thisisparker commented 1 year ago

Looks like that script uses browser-cookie3 but just asks the user to pick which browser they want to use the cookies from. Not a bad approach, and could be provided as a command-line flag too so it wouldn't require interaction. Hm!

edsantiago commented 1 year ago

No interaction needed, just pass browser as argv[1]: https://github.com/Q726kbXuN/nytxw_puz/blob/master/nyt.py#L536-L539

Ugleh commented 1 year ago

How does the YML file need to look with NYT-S? As it started it is empty

thisisparker commented 1 year ago

As it currently stands, it should look like:

nyt:
  NYT-S: long-NYT-S-key-here-pulled-from-your-browser-cookies

In the future, in accordance with #92, I think that should probably be in a cookies key, but I will make that migration seamless when it happens. For now, the above format is correct.

vcifello commented 1 year ago

after hours of trying to figure out the NYT auth flow and getting it working in postman, i found this library

https://github.com/GodTamIt/nytxw-puz-cli

his puzzle parsing is outdated, but the auth flow looks perfect!