Closed mixographer closed 1 year ago
This one seems pretty easy. Probably will get this into the next release!
I was thinking about the Guardian Crosswords. Would this tool want to support them?
Since there are several different kinds, like cryptic, quick, weekend, would you prefer to have each be a top level puzzle? So for instance you could call them as individual args to xword-dl? Like:
gcry - Guardian Cryptic gspe - Guardian Speedy gqui - Guardian Quick Crossword geve - Guardian Everyman
Or would Guardian be an arg, that is then modified by adding more arguments? Seems better to add more top level puzzles, and then none of the existing arguments need to be changed.
Working parsing has already been done by @pj-paul here [https://github.com/pj-paul/guardianpuz]
Happy to make any edits for perf etc.
Alright, I think I've got support for download-latest and search-by-URL in place for all (!) Guardian puzzles. @mixographer would you mind checking out this repo and seeing how it works for you? I'd also welcome some notes on implementation... It seems a little clunky to have so many Guardian verticals at the "top level" but they really are different puzzles.
@pj-paul Sorry, I ended up basically reimplementing from scratch because I wanted to get a handle on the data and also avoid the numpy dependency. You're of course welcome to any of my implementation but I bet yours works just fine for you!
One last note: I haven't yet implemented search-by-date because the only way I can see how to is a very crunchy calculator of publication periods. Not off the table, and might be fun in its way... but before I do that, is there a compelling reason to instead or additionally implement lookup by ID?
I will pull these changes and try it.
One way I always think of the guardian puzzles is by number, rather than by date. So all the cryptic blogs that either provide answers or difficulty levels always refer to puzzles by number. so @pj-paul 's method of taking a number arg was nice, and the URLS seemed to be based on those same numbers.
That being said, I know none of the current downloaders work by 'crossword number.'
I think it would be pretty easy to add a flag for id
like @pj-paul does that just works for the Guardian keywords.
Also I remembered one more note: the .puz format doesn't support unfilled solution grids, so where the solution has not yet been released, I've filled the grid with X
. (That's similar to how some subscription outlets do it, in my experience.)
I noticed you added AZED as an option with grda as the argument, but I think the AZED crossword is only provided as a PDF version. we could parse out the pdf and then download that asset, but since everything else is .puz files, maybe we just drop it as an option?
Good catch, removed. Wouldn't be possible to convert to .puz either as it's barred.
Ditto the Genius puzzle, which I've also removed
I've gone through every type of puzzle, and I've downloaded some by URL, and they all work great! One thing that might be interesting, is if you grab the prize or weekend puzzles (maybe others?) that don't yet have answers, the naming convention could show that fact. For instance
'Guardian Prize - 20221111 - Prize crossword No 28,913.puz' could be the name if it were downloaded and had answers. and the name could be different if it had Xs for answers: 'Guardian Prize - 20221111 - Prize crossword No 28,913 Blank.puz Or something, just so if you download the puzzle again, it won't overwrite. Anr that you know you don't have the answers yet, if you need them (I do)
Oh yeah, I was thinking of maybe printing a notice in the terminal but I think your idea of putting it in the filename is better. (I run xword-dl
as a cron job, typically, so I wouldn't see interactive notices anyway.) Probably a good idea to get it into the filename by putting it directly into the title.
Maybe 'blank' is not the terminology? maybe an underscore or an x at the start of the filename? I don't know what would be best. I should put xword-dl in a cronjob. I currently run it and use xargs to run through the list of the crosswords. The Guardian options have really added a lot to the number of crosswords I can grab. Thanks for tackling this feature request!
What I've got in there now is "no solution provided" on the end of the title, which should typically populate out to the end of the filename. (I have some poorly documented mechanisms to customize the filename with tokens, but it will generally use the title. Around the next release I'd like to overhaul the readme to explain things better.)
One nice thing about the cronjob is that you can set it to the publication schedule. So I've got one line that grabs the New Yorker each M-F, one that downloads the Washington Post on Sundays, and one that pulls from a bunch of different daily outlets.
I could take a look at the usage and the readme and see if I could help you with that.
Wow! That was fast! The Readme looks good!
Closed in v2022.11.16 🎉
Originally posted by @mixographer in https://github.com/thisisparker/xword-dl/issues/52#issuecomment-1258539432