When testing with "🦀".
Naively just modifying char_width() prints multiple instances of the unicode character in question.
"🦀" becomes "🦀🦀",
I left this behavior enabled for whitespace since I believe it is a part of the treatment of '\t' tab characters.
This perhaps doesn't fully fix the issue as reported, they might need a boolean to Config setting, which sets things up to call either width and width_cjk, but I don't know the right behavior to shoot for regarding whitespace in cjk.
In just testing some byte offset to char index conversion code, I noticed https://github.com/zesterer/ariadne/issues/41
When testing with "🦀". Naively just modifying
char_width()
prints multiple instances of the unicode character in question. "🦀" becomes "🦀🦀",I left this behavior enabled for whitespace since I believe it is a part of the treatment of '\t' tab characters.
This perhaps doesn't fully fix the issue as reported, they might need a boolean to
Config
setting, which sets things up to call eitherwidth
andwidth_cjk
, but I don't know the right behavior to shoot for regarding whitespace in cjk.