Open jpd236 opened 2 years ago
<i/>
is invalid HTML in that only certain tags are permitted to be self-closing. It's also not really the kind of thing you'd generally expect to see, though I did observe it once (not sure whether it was in the original source data or if I introduced it when converting from another format to JPZ). But the failure mode here of just ignoring the closing tag doesn't seem great.Probably not a huge priority in the grand scheme of things, but I guess the next step here would be to try to reproduce this with a smaller sample app and pass the report along to wxWidgets.
EDIT: I originally posted this without escaping the <i/>
above, and, funnily enough, the rest of the comment showed up in italics! Maybe this is actually how HTML parsers are supposed to handle this...
Hmm . . . looking through what I think is the jpz schema, it seems like clue text is actually XML, not a string of html? In which case <i/>
would in fact be a self-closing tag :) It looks like the spec allows <i>
<b>
<span>
<sub>
and <sup>
children in clue text.
So . . . maybe this needs to be handled in the jpz parser? We could convert self-closing tags to the equivalent empty tag, or perhaps just remove them entirely since that should render the same way.
I realized that I filed this as part of investigating the clue mentioned in https://github.com/jpd236/kotwords/issues/24, and indeed that specific clue is still a working repro case where the italic tag is non-empty (and thus not self-closing). So it does seem like there's more to this.
Attached a sample JPZ where the clue for 1-Across is:
<span>First across clue with</span><i> </i><span>italicized space</span>
This renders correctly in Crossword Solver, but in XWord, the space is omitted, and "italicized space" is in italics.
The following XML in a JPZ clue:
Causes "more of the clue" to be italicized even though it's not actually enclosed in the italics. This also seems to happen with a regular empty tag instead of a self-closing tag, and also if there's whitespace inside the tag; there has to be some non-blank character for the parsing to work correctly, AFAICT.