thisisparker / xword-dl

⬛⬜⬛ Command line tool to scrape crosswords from online solvers and save them as .puz files ⬛⬜⬛
MIT License
139 stars 30 forks source link

emoji in atlantic puzzle do not appear in saved .puz file #192

Open alexdej opened 2 weeks ago

alexdej commented 2 weeks ago

As reported in https://github.com/alexdej/puzpy/issues/30 emoji from the Atlantic daily puzzle from 2023-11-13 do not appear in the resulting .puz file. Instead it appears to have 0s which break parsing of the file and cause AcrossLite to crash. Not sure if this is an issue with xword-dl or with puz.py.

Screenshot 2024-06-13 200529 Screenshot 2024-06-13 135829 atlantic-20231113.zip

afontenot commented 2 weeks ago

Edit: this is now outdated, see the below comment.

afontenot commented 2 weeks ago

Incidentally, I don't see any issue here with the latest git commit of xword-dl and 0.2.4 of puzpy. Probably some change of behavior since that release is the proximate cause.

afontenot commented 2 weeks ago

Had another look at this. Three thoughts:

        puzzle = puz.Puzzle()
        puzzle.version = b'2.0'
        puzzle.fileversion = b'2.0\0'
        puzzle.encoding = 'UTF-8'
thisisparker commented 2 weeks ago

Yep, @afontenot has this right I think. The latest released version of xword-dl doesn't know what to do with emoji, because they can't be encoded with the text encoding scheme available in puz files before v2, and so if a clue contains only emoji the resulting file would get bit crashy.

I have attempted to fix that with #157 but haven't released to PyPI since then. HEAD should have it. If you encounter any issues like that with the code at HEAD I want to know about it. In the meantime, yes, I am overdue for a release!

I have to think about the implications of generating puz v2 files; my understanding is that client support is not fully there but maybe I could have it be an option somehow.