Superscript arabic numerals → Unicode, why not subscript arabic numerals, too?

rdoeffinger / DictionaryPC

Java code to generate dictionaries for QuickDic Android app (see Dictionary repo). Fork of project that used to be hosted at code.google.com/p/quickdic-dictionary

Apache License 2.0

16 stars 8 forks source link

Superscript arabic numerals → Unicode, why not subscript arabic numerals, too? #9

Open Moonbase59 opened 2 years ago

Moonbase59 commented 2 years ago

In https://github.com/rdoeffinger/DictionaryPC/blob/509e1fa70a1c9f03a329fcc6df982eb7c341b5ea/src/com/hughes/android/dictionary/parser/wiktionary/AbstractWiktionaryParser.java#L67-L93 you replace superscript arabic numerals against their Unicode equivalents, why not subscript numerals, too? (Unicode range U+2080..U+2089)

This might help with line spacing issues (except if it were for footnotes/endnotes only).

rdoeffinger commented 2 years ago

I did not have any case where subscript appeared, and adding things without a need and a test-case seemed not a good idea. If you have an example I can look into it.

rdoeffinger commented 2 years ago

Or if you write and test a patch yourself that is also an option - probably the approach should use a single regexp for both cases to avoid going through the data twice.

Moonbase59 commented 2 years ago

In the German Wiktionary, https://de.wiktionary.org/wiki/H%E2%82%82O would be one. For content, the entry https://de.wiktionary.org/wiki/Alkohol might also be useful (contains C₂H₅OH).