thisisparker / cursewords

:pencil: Terminal-based crossword puzzle solving interface
GNU Affero General Public License v3.0
248 stars 30 forks source link

Handle multi-part clues #15

Open rentalcustard opened 5 years ago

rentalcustard commented 5 years ago

I'm not really sure where the problem lies here - whether it's a limitation of the Puz format, of cursewords, or of the information I can provide to the latter via the former, but in the interests of capturing this somewhere, I've opened an issue here.

My scraper for Guardian crosswords occasionally comes across clues which give the answers for more than one set of lights. They are specified like this:

5, 9 British fighter once offered inducement using Arab carrier (7, 5)

or, in a more complicated case:

5, 9 down British fighter once offered inducement using Arab carrier (7, 5)

(where the direction of the two parts of the answer doesn't match). The clue for 9A in the first case, or 9D in the second case, would then read "See 5" or "See 5 across".

The clue can span 2 or more answer slots, and is not necessarily composed of 2 words - for example, if 5A and 9A were both 4 squares wide, we could clue "Whenever" as:

5, 9 <insert cryptic clue for whenever here> (8).

I'm able to prepare a puz file which doesn't choke on these, correctly putting the clue text in the right position to be picked up by cursewords as the clue for 5 A, but then the numbering display in cursewords doesn't include the extra information: that the remainder of the clue should go in 9A.

I'm really not sure how best to tackle this here, especially since the PUZ file format doesn't let us specify custom numbering, but it's a fun one to think about.

Today's Guardian prize crossword has 3 examples of this type of clue, but unfortunately none that go over 3 or more answer positions.

thisisparker commented 5 years ago

I'm not as familiar with cryptic crosswords, but I am glad you opened this issue because it's a feature I really want to build and which comes up all the time in non-cryptic crosswords. At the very least, there's two-part answers, like: 32 DOWN: With 33 Down, fanciful cans on some MacBook decals and then there's also many-part connections for theme revealers, like 66 ACROSS: Race suggested by 19-, 39- and 59-Across?

In most or all cases, the second clue referred to in the former case consists of a See 32 Down type note. In many cases, the theme answers in the latter case have an asterisk or some other means of identification as theme answers.

In the case of your cryptic crossword, you could "fix" the issue by just jamming the second (and third, etc) part of the clue number into the clue text. But in that case and in the ones I identify above, it's helpful to the solver to be able to quickly identify the connected answers.

However! This isn't an official .puz feature, and so we have to identify connected clues ourselves. I think that can be a regex match, and it might be as simple as \d{1,3}[\s-]?[AaDd]. (Although that won't catch the second example above, so who knows.) Then there's a display issue, because now we're introducing a "secondary highlight." In that way this is a little bit connected to #10, and for that reason I think it should be introduced along with a config file (as described in #16) that can store some preferences on colors and displays between opens.

rentalcustard commented 5 years ago

An extra thing that would be really nice to build in here is for cursewords to move the cursor to the next space in the grid requiring input when you're typing the solution to a multi-part clue. So in the 5,9 example, once I've filled in all the lights for 5A, the cursor should jump automatically to the first position in 9A, and backspace should also take me back to 5A if I'm in 9A and keep going backwards. You can see that behaviour in the Guardian's online crossword solver.

thisisparker commented 5 years ago

Oh, that's interesting and I hadn't thought of that! I think it would have to be optional (because I'm not sure you always want to do that in non-cryptics), but once we know what the related words are, the world's kind of our oyster on what to do with that info.

rentalcustard commented 5 years ago

Yea. I think defining a format for passing that information in the clue text is the first step, and we can decide on which behaviours are always present, and which are optional later.

Since we don't (I think!) want to fork the PUZ format, the additional information should be both human- and machine-readable. I like the idea of appending something like "[with: 9A, 12D]". UK cryptic setters are known for pulling fun tricks with the formatting of clues[1], and so I wouldn't put it past them to use some delimiter as part of a clue at some point, so I think we'd want to make it stand out a little with square brackets, curly braces, or something else that's not too obtrusive to a solver using a program which can't parse this extended information, but which is still unlikely to appear as part of the normal clue text.

1: for example, a very famous clue is HIJKLMNO (5) - answer WATER, and the '?' symbol has been used to clue QUESTION MARK where normally one would expect a definition in words.

thisisparker commented 5 years ago

Heh, I think we're approaching the problem at opposite ends. As I start to develop this feature I'm going to try to match the behavior out there (I might even look at whether construction programs offer this as a feature, in which case there might be a pretty standard output) and try to match as much as I can. The good news is that means your .puz files are very likely to fit within it. But ideally (for me) this feature approximates the behavior already present in other apps, like this NYT iOS app example:

signal-2019-03-10-152324

So, insofar as this is an issue on the cursewords repo, I think we take as given that

And then develop the feature from that basis.