thisisparker / xword-dl

⬛⬜⬛ Command line tool to scrape crosswords from online solvers and save them as .puz files ⬛⬜⬛
MIT License
147 stars 32 forks source link

Not possible to search WSJ by date #79

Open thisisparker opened 1 year ago

thisisparker commented 1 year ago

But it would be nice and it doesn't seem like it is impossible. The issue is that the landing pages for a given puzzle are not at reliable URLs (because they include a slugified version of the title and a long id string, e.g. https://www.wsj.com/articles/floating-upstream-thursday-crossword-december-1-11669405858); the embedded iframe puzzle there includes the date but also a shorter identifier in the URL (e.g. https://www.wsj.com/puzzles/crossword/20221201/52272/?embed=1); and the underlying puzzle data is at a location derived from that URL (e.g. https://www.wsj.com/puzzles/crossword/20221201/52272/data.json).

So what we would need is a way to go from date to any of those URLs. It seems doable, but I don't know how!