mrichards42 / xword

Cross-platform crossword solving
https://mrichards42.github.io/xword/
GNU General Public License v3.0
42 stars 13 forks source link

Empty italic tag in clues causes rest of clue to be italicized #171

Open jpd236 opened 2 years ago

jpd236 commented 2 years ago

The following XML in a JPZ clue:

<span>Part of clue<i/>more of the clue</span>

Causes "more of the clue" to be italicized even though it's not actually enclosed in the italics. This also seems to happen with a regular empty tag instead of a self-closing tag, and also if there's whitespace inside the tag; there has to be some non-blank character for the parsing to work correctly, AFAICT.

jpd236 commented 2 years ago

Probably not a huge priority in the grand scheme of things, but I guess the next step here would be to try to reproduce this with a smaller sample app and pass the report along to wxWidgets.

EDIT: I originally posted this without escaping the <i/> above, and, funnily enough, the rest of the comment showed up in italics! Maybe this is actually how HTML parsers are supposed to handle this...

mrichards42 commented 2 years ago

Hmm . . . looking through what I think is the jpz schema, it seems like clue text is actually XML, not a string of html? In which case <i/> would in fact be a self-closing tag :) It looks like the spec allows <i> <b> <span> <sub> and <sup> children in clue text.

So . . . maybe this needs to be handled in the jpz parser? We could convert self-closing tags to the equivalent empty tag, or perhaps just remove them entirely since that should render the same way.

jpd236 commented 2 years ago

I realized that I filed this as part of investigating the clue mentioned in https://github.com/jpd236/kotwords/issues/24, and indeed that specific clue is still a working repro case where the italic tag is non-empty (and thus not self-closing). So it does seem like there's more to this.

Attached a sample JPZ where the clue for 1-Across is:

<span>First across clue with</span><i> </i><span>italicized space</span>

This renders correctly in Crossword Solver, but in XWord, the space is omitted, and "italicized space" is in italics.

test.zip