ubermichael / isetools

Tools for parsing data for the Internet Shakespeare Editions
GNU General Public License v2.0
2 stars 3 forks source link

Feature: log warnings for redundant tagging #5

Closed telic closed 2 years ago

telic commented 9 years ago

Redundantly nested tagging should log a warning. For example, <EM>one <EM>two</EM> three</EM> is redundant since the inner EM doesn't add any information (we don't support multiple "levels" of emphasis). The following tags would be affected:

Of course, the "crossing" errors described in #4 would also apply to self-nesting.

ubermichael commented 9 years ago

I think that's already done with the nesting validator, although it's not smart enough to handle the exceptions you've noted.

https://github.com/ubermichael/isetools/blob/master/src/main/java/ca/nines/ise/validator/NestingValidator.java#L79

telic commented 9 years ago

Yes, but the warning message should be that the tagging is redundant (and can be corrected) as opposed to incorrect.

Also, if #4 is implemented, then this will be more relevant :)

ubermichael commented 9 years ago

I'm not convinced that this should be automatically fixable.

For example, this

<SD t="entrance">Enter Bob, Ernie<SD t="exit">Exit Carmel</SD></SD>

is clearly an editor error.

telic commented 9 years ago

That's true...

Some other examples:

<SD t="entrance">Enter Bob, <SD t="entrance">Ernie</SD></SD>

This one isn't so clear. The editor might be trying to mark the characters separately (especially if @who is used), or might just have redundant tagging.

<SD t="entrance">Enter Bob, <SD t="action">running</SD></SD>

I've seen uses like this one before. It makes perfect sense and shouldn't be an error.

Obviously stage directions in particular will need some more careful thought. I don't think any of the other tags listed in the OP have similar issues though.