Subscripts and superscripts

jwilk commented 6 years ago

I'd like tran to support subscripts and superscripts: https://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts#Superscripts_and_subscripts_block

kilobyte commented 6 years ago

Sounds reasonable.

I haven't done so already only because letter form blocks that are not substantially complete lead to unfun. I wonder what to do for missing letters.

Combining characters are not an option, even when placed above U+0020 as a carrier.

Also, there's no capital subscripts at all — would you think it's better to convert to lowercase or to leave them unhandled?

jwilk commented 6 years ago

It's better to keep uppercase letters intact.

kilobyte commented 6 years ago

So I did most of the work manually... and only during adding those weird IPA letters realized that UnicodeData has:

02B0;MODIFIER LETTER SMALL H;Lm;0;L;<super> 0068;;;;N;;;;;

which makes generating the whole table an one-liner...

But then, it turns out there's a code problem. The current algorithm assumes that both sides of the conversion can be freely lower/uppercased, which works fine but only as long as either both cases exist or case-mangled conversion is ok (unicameral scripts). Not the case here...

Fixing this would require either goofy tagging or moving case handling from runtime to building conversion tables (the latter might speed-up the program, too). That'd require some effort, thus I just committed the data to a branch, dropping this issue to the bottom of my TODO list... :(

I wonder though, perhaps it might be better to fix the root issue of some {super,sub}scripts missing upstream?

kilobyte / tran

Subscripts and superscripts #1