thisisparker / xword-dl

⬛⬜⬛ Command line tool to scrape crosswords from online solvers and save them as .puz files ⬛⬜⬛
MIT License
139 stars 29 forks source link

Limited rebus support #17

Open thisisparker opened 4 years ago

thisisparker commented 4 years ago

Related to #9, this script should start to support rebuses in puzzles where possible. I need to find an online puzzle to test with for each scraper.

thisisparker commented 4 years ago

I believe, with #19, I've successfully gotten rebus support in for Amuse Labs puzzles, which include WaPo, Atlantic, Newsday, New Yorker, and LAT. That means I'm still in the market for a USA Today and a WSJ puzzle that has a rebus in it.

thisisparker commented 3 years ago

WSJ, it turns out, does not support rebuses in their puzzles at all! From their How to work the crossword page:

Note: Occasionally, a puzzle includes squares that must hold more than one letter, in these cases use the first letter in those squares.

I don't believe I've ever seen a rebus in a USA Today puzzle, but that is for now the only unknown, everything else either has or cannot have support.

thisisparker commented 3 years ago

Renamed the issue to reflect that rebus support is either all there or almost all there!

mixographer commented 2 years ago

Monday, 10/18 NYT had some shaded squares and heavy lines. I don't know if that is called a rebus. Not parsing those 'special' squares didn't impact the ability to solve the puzzle.

thisisparker commented 2 years ago

Those aren't specifically rebus, but they do represent something this tool can't do. In that case, though, it's because the .puz format doesn't support those things!

I've thus far made the specific decision to translate NYT's circles to circles in the .puz, and ignore additional features that .puz doesn't support. I think there's an argument to be made that in cases like 10/18, where there are shaded squares but not circles, then it might make sense to convert the shaded squares to circles. But I'm also a little hesitant to make "editorial" decisions like that in advance, so I'm not sure exactly how to proceed.

mixographer commented 2 years ago

I understand. I was looking at that one in the app and as a .puz as xword-dl parsed it. Seemed like the extra 'decorations' were a nice feature but weren't essential. We may see more cases like this in the future since NYT has moved away from the .puz format.

I don't know what the best course is either. When I want to parse a NYT puzzle I usually go parse the solution page on xwordinfo. I see they have a div for shaded squares, and are somehow putting in the bars. (I can't see how at a quick glance.) But I also think they have a relationship with the Times and may get the original data files to build their page from.

Also, when I comment on these issues, I'm not expecting a fix, but more just providing info in case it is useful. Thanks for your work on this tool.