ubermichael / isetools

Tools for parsing data for the Internet Shakespeare Editions
GNU General Public License v2.0
2 stars 3 forks source link

Feature: case-insensitive tagging #13

Closed telic closed 8 years ago

telic commented 9 years ago

All IML tags should be case-insensitive. Eg. <HW> and <hw> should be considered identical.

ubermichael commented 9 years ago

All the tags are looked up in the schema in a case-insensitive manner: <HW> and <hw> are validated in exactly the same way, as if they were both <HW> tags. The nesting validator does all tag name comparisons in lowercase. SGML and XML output happens in the original case.

Is there any other part of the system that needs to case-insensitive?

telic commented 9 years ago

Okay sounds good; I wasn't sure if this was the case already or not!

Is there any code/commenting that makes it clear that this is how tags should be handled? TagNode has a case-preserving get/setName, so it would be easy to forget to use .toLowerCase() when it mattered.

Case in point, TagNode itself seems to have forgotten about this detail in .setName(). I don't know if there are any other cases elsewhere...

ubermichael commented 9 years ago

That's a very good point.

There are few places in DOM where that happens as well, mostly with respect to indexing the nodes and calculating act.scene.line stuff.

In the mean time, I've switched the schema and tag classes to use a case-insensitive data structure, and written a new one for tag attributes.

But there isn't really a good way to do case-insensitive and case-preserving tag names - we'll just need to be ever vigilant.

ubermichael commented 9 years ago

Not sure why it isn't showing up, but the code i just mentioned is in branch issue-13, commit ce6c0c92

ubermichael commented 8 years ago

Merged the issue-13 branch, which should fix all this. If it doesn't, please file another issue.