Open Wolf-SO opened 7 years ago
above toAsciiStr
function is obviously incompatible with Parsing.hs
, line 1205, because the id is built via
catMaybes $ map toAsciiChar id'
...but I'm very new to Haskell and maybe this is easy to change...
See also #2821 for a request touching a related part of the code. Note the GitHub generates %C3%9F
in the identifier for the ß
(so they are UTF8-encoding, then URL-encoding the octets).
@jgm What about swithcing to a two-stage approach? #2821 seems to suggest to be related to the output format. And this issue is also not only about input. HTML5 supports a superset of HTML4 ids. Wouldn't it be better to first read a "unified Pandoc id" (without leading numbering but including -
) and later to output an id that is compatible with the required HTML version?
It seems, that the labels should also include [writer] and [format:HTML].
Sorry to revive this thread, if there is to be any update on this, as a French I would add to the list:
('\339',"oe")
('\338',"OE")
('\230',"ae")
('\198',"AE")
Or maybe those are directly fit for the Asciify
module (they are currently dropped when using +ascii_identifiers
in input format) ?
Icelandic has þ (thorn) '\254'
, transliterated info "th"...
Maybe there is a resource to find unicode characters actually representing several characters? Such as
As mentioned in https://github.com/jgm/pandoc/issues/807#issuecomment-310831480 and https://github.com/jgm/pandoc/issues/807#issuecomment-310831794 ,
I'd like to have the option to transliterate non-ASCII chars into multiple ASCII chars. This would be especially helpful for German Umlauts since there exists already a (classical) convention. I was looking into and working on
src/Text/Pandoc/Asciify.hs
but I'm not sure if it wasn't better to provide a new extension instead of modifying the existing one, since two-letter replacements would expand the HTMLid
s which could break something.I tried to expand the map and change it to Map Char String (added the transliteration of the letter
ß
as,('\223',"ss")
), this is how this may look like (just a fragment containing the 7 German transliterations).