Open saihaze opened 2 years ago
Do you have a screenshot + sample text example of this?
Let's take examples/simple.rs as an example. It is supposed to be like: But when it includes a Chinese character, it looks like this:
How much did you increment the spans by when doing so? And does Rust's .chars()
count the character as a single character?
Usually a CJK character (a character used in China, Japan and Korea) takes the space of two ASCII characters. Rust's .chars()
does count it as a single character. Perhaps you can use crate unicode_width
instead.
I've never seen a wider character, though. But maybe you can take them into account, too.
This is my own diagnostics
module. It is simple, but might be an example.
I'll put it on the list of things to do when I get round to refactoring the crate's internals. Contributions are also welcome of course!
Going to break my comment in the above PR out here:
The PR above fixed some of these issues, but probably not enough for CJK. I imagine what is needed should be fairly simple, I just don't know the right things to do regarding whitespace on CJK. I think all that is left should be something to the effect of:
Config
builder for enabling CJK.width_cjk
instead of width
from unicode_width
from lib.rs: char_width()
.As a Chinese, I have not seen "ambiguous characters" rendering wide for years. The width_cjk
thing might not be necessary. From the documentation of width_cjk
, I can refer to UAX#11, which says that modern practice tend to render ambiguous characters narrow. The UAX recommends that we only render them wide when we reliably know that the context wants them to be wide. At least from what I see, this does not happen a lot nowadays, especially when related to formal languages where an error could be pointed out.
But the hardcoded width of whitespace is a problem, for there is at least a wide space:
. The current approach renders it as a narrow space. I am not sure about why all whitespace converts to \x20
with width 1 in char_width
:
https://github.com/zesterer/ariadne/blob/06b17483d8ce1f1decf290935818f55ba6380065/src/lib.rs#L401-L413
Regarding this, I come up with two possible solutions.
write.rs
:
https://github.com/zesterer/ariadne/blob/06b17483d8ce1f1decf290935818f55ba6380065/src/write.rs#L585-L599
we do not render whitespaces multiple times; instead we could test if the original c
before calling char_width
was \t
, or we could repeat only \x20
instead of all whitespaces.\x20
, but calculate its width instead of filling in 1
:
c if c.is_whitespace() => (' ', c.width().unwrap_or(1)),
Both solutions should work well with
, but I believe a clarification about the point of converting whitespaces to \x20
is needed. I prefer the former, since it actually keeps that wide space. If the terminal shows, for example, a dot, where there is a space, the user could possibly see some differences.
I prefer the former, since it actually keeps that wide space. If the terminal shows, for example, a dot, where there is a space, the user could possibly see some differences.
Might just be useless trivia that won't ocur in practice, but another corner case/point in favor of doing that is U+1680 Ogham space mark, a character with a visible representation that is_whitespace(). https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=cb4c89643c03d3557d336623b242912f
When the source code full-width characters (e.g. CJK characters), underlining is not proper. They might need too spaces.