ubermichael / isetools

Tools for parsing data for the Internet Shakespeare Editions
GNU General Public License v2.0
2 stars 3 forks source link

digraphs are ligatures #17

Closed telic closed 8 years ago

telic commented 9 years ago

After discussion with our coordinating editors, I've come to the conclusion digraphs and ligatures are identical for our purposes. The DigraphCharNode class should be dropped, and all curly-escapes it supports should instead be handled with LigatureCharNode.

ubermichael commented 9 years ago

This one's complicated. It'll require changing the lexer, grammar, dombuilder, dropping the digraph class, mangling a bunch of transformers and validators, and mangling a bunch of test classes.

Doable, but not simple.

ubermichael commented 9 years ago

Looking at it a little more, it seems like it would make sense to unify the unicode and digraph classes. They both represent the same thing: a single unicode character.

telic commented 9 years ago

I need to be able to handle digraphs (and other ligatures) as individual characters, not just as single precomposed unicode letters. For example, I generally want to use {ffl} and {db} as "ffl" and "db" respectively, and only in some special ligature mode would I possibly replace these with the precomposed unicode form (or possibly just turn on the font's own ligature handling). On the other hand, I'd never want to treat {th} as "th" or {&amp} as "&amp".