thisisparker / xword-dl

⬛⬜⬛ Command line tool to scrape crosswords from online solvers and save them as .puz files ⬛⬜⬛
MIT License
140 stars 30 forks source link

Guardian Cryptic Support Request #67

Closed mixographer closed 1 year ago

mixographer commented 1 year ago
    While this is open I'll add the feature request for the Guardian puzzles. I saw you were helping rentalcustard with [https://github.com/rentalcustard/guardianpuz](https://github.com/rentalcustard/guardianpuz) the guardian cryptics. He's got the basic parsing, but doesn't put the solutions in the .puz output. I'll take a look at your comments on his repo and see if I can parse the solutions, but if you're adding some cryptics, my vote would be for the guardian ones as well.

Originally posted by @mixographer in https://github.com/thisisparker/xword-dl/issues/52#issuecomment-1258539432

thisisparker commented 1 year ago

This one seems pretty easy. Probably will get this into the next release!

mixographer commented 1 year ago

I was thinking about the Guardian Crosswords. Would this tool want to support them?

Since there are several different kinds, like cryptic, quick, weekend, would you prefer to have each be a top level puzzle? So for instance you could call them as individual args to xword-dl? Like:

gcry - Guardian Cryptic gspe - Guardian Speedy gqui - Guardian Quick Crossword geve - Guardian Everyman

Or would Guardian be an arg, that is then modified by adding more arguments? Seems better to add more top level puzzles, and then none of the existing arguments need to be changed.

Working parsing has already been done by @pj-paul here [https://github.com/pj-paul/guardianpuz]

pj-paul commented 1 year ago

Happy to make any edits for perf etc.

thisisparker commented 1 year ago

Alright, I think I've got support for download-latest and search-by-URL in place for all (!) Guardian puzzles. @mixographer would you mind checking out this repo and seeing how it works for you? I'd also welcome some notes on implementation... It seems a little clunky to have so many Guardian verticals at the "top level" but they really are different puzzles.

@pj-paul Sorry, I ended up basically reimplementing from scratch because I wanted to get a handle on the data and also avoid the numpy dependency. You're of course welcome to any of my implementation but I bet yours works just fine for you!

One last note: I haven't yet implemented search-by-date because the only way I can see how to is a very crunchy calculator of publication periods. Not off the table, and might be fun in its way... but before I do that, is there a compelling reason to instead or additionally implement lookup by ID?

mixographer commented 1 year ago

I will pull these changes and try it.

One way I always think of the guardian puzzles is by number, rather than by date. So all the cryptic blogs that either provide answers or difficulty levels always refer to puzzles by number. so @pj-paul 's method of taking a number arg was nice, and the URLS seemed to be based on those same numbers.

That being said, I know none of the current downloaders work by 'crossword number.'

thisisparker commented 1 year ago

I think it would be pretty easy to add a flag for id like @pj-paul does that just works for the Guardian keywords.

Also I remembered one more note: the .puz format doesn't support unfilled solution grids, so where the solution has not yet been released, I've filled the grid with X. (That's similar to how some subscription outlets do it, in my experience.)

mixographer commented 1 year ago

I noticed you added AZED as an option with grda as the argument, but I think the AZED crossword is only provided as a PDF version. we could parse out the pdf and then download that asset, but since everything else is .puz files, maybe we just drop it as an option?

thisisparker commented 1 year ago

Good catch, removed. Wouldn't be possible to convert to .puz either as it's barred.

thisisparker commented 1 year ago

Ditto the Genius puzzle, which I've also removed

mixographer commented 1 year ago

I've gone through every type of puzzle, and I've downloaded some by URL, and they all work great! One thing that might be interesting, is if you grab the prize or weekend puzzles (maybe others?) that don't yet have answers, the naming convention could show that fact. For instance

'Guardian Prize - 20221111 - Prize crossword No 28,913.puz' could be the name if it were downloaded and had answers. and the name could be different if it had Xs for answers: 'Guardian Prize - 20221111 - Prize crossword No 28,913 Blank.puz Or something, just so if you download the puzzle again, it won't overwrite. Anr that you know you don't have the answers yet, if you need them (I do)

thisisparker commented 1 year ago

Oh yeah, I was thinking of maybe printing a notice in the terminal but I think your idea of putting it in the filename is better. (I run xword-dl as a cron job, typically, so I wouldn't see interactive notices anyway.) Probably a good idea to get it into the filename by putting it directly into the title.

mixographer commented 1 year ago

Maybe 'blank' is not the terminology? maybe an underscore or an x at the start of the filename? I don't know what would be best. I should put xword-dl in a cronjob. I currently run it and use xargs to run through the list of the crosswords. The Guardian options have really added a lot to the number of crosswords I can grab. Thanks for tackling this feature request!

thisisparker commented 1 year ago

What I've got in there now is "no solution provided" on the end of the title, which should typically populate out to the end of the filename. (I have some poorly documented mechanisms to customize the filename with tokens, but it will generally use the title. Around the next release I'd like to overhaul the readme to explain things better.)

One nice thing about the cronjob is that you can set it to the publication schedule. So I've got one line that grabs the New Yorker each M-F, one that downloads the Washington Post on Sundays, and one that pulls from a bunch of different daily outlets.

mixographer commented 1 year ago

I could take a look at the usage and the readme and see if I could help you with that.

mixographer commented 1 year ago

Wow! That was fast! The Readme looks good!

thisisparker commented 1 year ago

Closed in v2022.11.16 🎉