thisisparker / xword-dl

⬛⬜⬛ Command line tool to scrape crosswords from online solvers and save them as .puz files ⬛⬜⬛
MIT License
140 stars 30 forks source link

RFE: King Digital puzzles (Premier, Joseph, Sheffer) in JPZ format #115

Open ehcloninger opened 10 months ago

ehcloninger commented 10 months ago

I use the Forkyz app on Android, which was forked from Shortyz. That app can download puzzles directly from King Digital Features. You can specify certain dates in the URL, so it's not always the latest.

The format of the request is https://puzzles.kingdigital.com/jpz/[feature]/YYYYMMDD.jpz

Where [feature] is Premier, Joseph, Sheffer.

Sheffer and Joseph are a daily, Monday thru Saturday https://puzzles.kingdigital.com/jpz/Joseph/20230811.jpz https://puzzles.kingdigital.com/jpz/Joseph/20230812.jpz https://puzzles.kingdigital.com/jpz/Sheffer/20230811.jpz

Premier is a weekly on Sunday https://puzzles.kingdigital.com/jpz/Premier/20230806.jpz https://puzzles.kingdigital.com/jpz/Premier/20230813.jpz

This would require a parser for the JPZ format, which looks to be a simple XML doc with a metadata section that could map to the .puz output. Sample file attached. 20230811.jpz.txt

thisisparker commented 10 months ago

Good news: We've already got a jpz parser in here, handled in the parse_xword function of the CrosswordCompilerDownloader. (That's a bit clumsy, as I don't think you should have to know the genealogy of puzzle file formats to work with the software, but here we are.)

I've just confirmed that at least one of your examples works with the CrosswordCompilerDownloader out of the box. If you're interested in trying your hand at implementing a downloader yourself, you might look at the examples in puzzlesocietydownloader.py or globeandmaildownloader.py, which subclass the Compiler Downloader for jpz parsing. Otherwise I am happy to leave this open and get to it when I can, but unfortunately I've been pretty busy lately.

thisisparker commented 10 months ago

Oh and to unpack a little about how I tested it, this is what I did in a Python REPL:

>>> import xword_dl
>>> url = 'https://puzzles.kingdigital.com/jpz/Joseph/20230811.jpz'
>>> dl = xword_dl.downloader.CrosswordCompilerDownloader()
>>> puzzle = dl.download(url)
>>> xword_dl.save_puzzle(puzzle, dl.pick_filename(puzzle))
Puzzle downloaded and saved as Thomas Joseph Crossword - August 11, 2023.puz.
mixographer commented 10 months ago

Hi Eric.

ehcloninger commented 10 months ago

@thisisparker! It's been a while since I've done python. I'll have to kick off the rust and give it a try.

@mixographer Dude, small world.

thisisparker commented 10 months ago

@ehcloninger OK! I'll leave it to you but if you have questions or anything feel free to ask. If it would be helpful, I'd be happy to look at or poke at an incomplete PR, as well.

ehcloninger commented 10 months ago

@thisisparker I'm having a problem launching xword_dl.py. It gets to line 16 from . import downloaderand I get Exception has occurred: ImportError attempted relative import with no known parent package.

Happens both on Windows and Mac. I'm using VSCode with Python 3.11.5 on both. If I'm in the python command line in that folder, I can issue the command as from xword_dl import downloader but I can't do it by launching from xword_dl.py. I've tried to open the project in VSCode from the project root folder as well as within the xword_dl/xword_dl folder.

Do you have special settings that make this work, or perhaps a HOWTO on the dev stuff?

Thanks

thisisparker commented 10 months ago

@ehcloninger yeah, I think this is "standard" Python weirdness that I take for granted. It's probably a good idea for me to write out more of a contributions HOWTO in the readme or somewhere. In the meantime, I think this particular issue would probably be addressed by running python -m xword_dl (note the underscore, not the -) from the base repo directory, to run it as a module. (Note that its dependencies still have to be present, so that will only work if you've installed them first from the requirements file.) It's probably a source of a bit of confusion that the file is also called xword_dl.py, but naming things is hard.

As you note, you can also import xword_dl from a REPL session, and use it that way. That's also a little under-documented, in part because I don't know if anybody is using it as a library and not from the command line. One sharp corner to watch out for if you use that approach: to re-import with changes you've made, you'll want to import importlib and then importlib.reload(xword_dl).

It's possible I'm doing something non-standard with the relative imports and that I should change that approach! Some of this is vestigial from a big reorganization of the package structure I did last year, and tbh I find a lot of this packaging stuff confusing and the resources explaining it are often conflicting. For me though, the python -m trick works when I'm testing changes and stuff, and maybe that's all you need. If it still doesn't work btw, I'm happy to jump on some kind of synchronous channel to help debug, and that would make an eventual HOWTO doc better!

ehcloninger commented 10 months ago

Thank you for the details, @thisisparker. That should be enough info to make a dent. As I'm learning Python (and how to do things the "python way"), I find it helps to have step-wise debugging in VS Code. As for synchronous interactions, I'm available on Teams with this user handle on outlook dot com. Ditto with the Google things and Discord.

ehcloninger commented 10 months ago

@thisisparker Just a quick note to say I solved my VS Code debugging issue with 3 lines of gankiness that I found on StackOverflow. I'll be sure not to check that bit in my PR. Totally understand about how code grows and technical debt creeps up on us. This is a bit of fun and my job fortunately doesn't depend on it.

ehcloninger commented 10 months ago

Partial solution. Haven't tested fully. Will follow up https://github.com/ehcloninger/xword-dl/commit/11513deac4d2ad3ae22b372f757dc812f221cffd I've already found an issue with downloading 'latest'. I was testing only the by-date feature. Not ready for a PR

ehcloninger commented 7 months ago

Hi @thisisparker Thanks for including #117 in the latest builds. Do I need to do anything to the PR for this issue? I notice I have the "pull remote" sync from your repo to mine in the PR, so if that's an issue, I can do a clean PR. Also, do you want the doc fixes as a separate PR? I was just checking in as I needed to pull down more puzzles to satisfy my addiction. Not in a hurry.