ubermichael / isetools

Tools for parsing data for the Internet Shakespeare Editions
GNU General Public License v2.0
2 stars 3 forks source link

unify "typeform" handling #18

Closed telic closed 8 years ago

telic commented 9 years ago

{s}, {r}, {w}, and {W} should be handled identically in the DOM. These are "typeforms" that appear a certain way in the printed text (ſ, ꝛ, vv, and VV respectively) which we want to generally handle as their modern equivalents in the digital text (s, r, w, and W).

ubermichael commented 9 years ago

I'm not sure what you're asking for here. {s} and {r} have a single character equivalent in unicode. But {w} and {W} don't.

They're handled by different classes, but they can be made to output similar tags, if that's what you're looking for. Again, I'm not sure I understand.

telic commented 9 years ago

I'm suggesting that {s} and {r} are more like {w} in what they mean and how we want to use them than they are like {th} or {&#xA3}. The latter are really just character escapes, but the former ecapsulate both what the original printed form was and something different that we actually want to use in the digital form.

ubermichael commented 9 years ago

But {s} means that a long s was printed in the page, as the printer likely intended. {th} means that a thorn character was printed in the page, also as intended. {w} means that the printer intended a w but used two v letter forms instead.

Maybe I don't understand what the "something different" is.

telic commented 9 years ago

To put it another way, I want to be able to make a button on the website that says "show original typeforms" which makes all those "vv"s and "ſ"s show up. Normally they'd always appear as "w" and "s", and search would always work with the "w" and "s" forms. In contrast, {th} would always be treated as "þ", everywhere.

ubermichael commented 8 years ago

Fix this by making innerText() public.