thisisparker / xword-dl

⬛⬜⬛ Command line tool to scrape crosswords from online solvers and save them as .puz files ⬛⬜⬛
MIT License
139 stars 30 forks source link

Can't download The New Yorker crossword puzzle for a specific date #146

Closed ntwk closed 7 months ago

ntwk commented 7 months ago

When I attempt to download The New Yorker crossword dated 2023/02/24

python -m xword_dl tny --date 2023/02/24

I encounter the following error:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/tsbtmn/abs/aur/xword-dl/xword-dl/xword_dl/__main__.py", line 3, in <module>
    xword_dl.main()
  File "/home/tsbtmn/abs/aur/xword-dl/xword-dl/xword_dl/xword_dl.py", line 243, in main
    save_puzzle(puzzle, filename)
  File "/home/tsbtmn/abs/aur/xword-dl/xword-dl/xword_dl/util/utils.py", line 28, in save_puzzle
    puzzle.save(filename)
  File "/home/tsbtmn/abs/aur/xword-dl/envs/virtenv-1/lib/python3.11/site-packages/puz.py", line 225, in save
    puzzle_bytes = self.tobytes()
                   ^^^^^^^^^^^^^^
  File "/home/tsbtmn/abs/aur/xword-dl/envs/virtenv-1/lib/python3.11/site-packages/puz.py", line 240, in tobytes
    self.global_cksum(), ACROSSDOWN.encode(ENCODING),
    ^^^^^^^^^^^^^^^^^^^
  File "/home/tsbtmn/abs/aur/xword-dl/envs/virtenv-1/lib/python3.11/site-packages/puz.py", line 369, in global_cksum
    cksum = self.text_cksum(cksum)
            ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tsbtmn/abs/aur/xword-dl/envs/virtenv-1/lib/python3.11/site-packages/puz.py", line 349, in text_cksum
    cksum = data_cksum(self.title.encode(ENCODING) + b'\0', cksum)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position 46: ordinal not in range(256)

It looks like it's hitting a problem with Unicode character U+2019 (Right Single Quotation Mark). This issue seems similar to issue #39 whereby certain Unicode characters cannot be converted to ASCII.

That said, I wouldn't think a "Right Single Quotation Mark" is a character that poses much of a problem when doing a lossy conversion to ASCII. Is this is known issue? Is there a workaround here?

I am running a clone of the xword-dl GitHub repo.

thisisparker commented 7 months ago

Hi there! You caught a real bug with this one. The character set supported by the .puz format is very limited, and it doesn't include smart quotes like . In this case, the puzzle title had one that I wasn't properly sanitizing away after adding a new title feature in #119. Whoops! It should work now if you clone from the repo again, and I'll close this issue when I ship the next release. Thanks for flagging and for a thorough bug report!

thisisparker commented 7 months ago

Keeping open until a ship a proper release with the fix

ntwk commented 7 months ago

I just pulled your recent commits and it seems to be working now. Thanks!

thisisparker commented 7 months ago

Fixed in the latest release!