zed-industries / zed

Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
https://zed.dev
Other
49.26k stars 2.98k forks source link

Unicode Character Counts / Column Number #12024

Open ifsheldon opened 5 months ago

ifsheldon commented 5 months ago

Check for existing issues

Describe the feature

I'd like to have character counting based on unicode instead of on bytes. This will greatly help normal non-English users to get basic character counting right.

If applicable, add mockups / screenshots to help present your vision of the feature

For example:

hello
你好

This should give a character count of (5 + 1 + 2) = 8, instead of (5 + 1 + 3*2) =12 like below.

image

Python counts the characters correctly:

string = "hello\n你好"
print(len(string))  # 8

Rust notes two ways to get the length of a string

let a = "hello\n你好";
assert_eq!(a.len(), 12); // in bytes
assert_eq!(a.chars().count(), 8);  // in characters or graphemes
JunkuiZhang commented 5 months ago

The change will require the underlying Rope structure to be grapheme aware, which I gave a quick test, approx 3 times slower than current implementation.

asdfer-1234 commented 5 months ago

If that is not possible, it would be nice to show "Bytes" instead of "Characters" to avoid confusion.

ifsheldon commented 5 months ago

I don't think byte counts are useful for users of a text editor.

ChaiTRex commented 2 months ago

I'm having a similar issue with regard to which column number the cursor is currently at, shown at the bottom of the screen. For example, moving the cursor past will change the column number by three rather than one.

Fixing this particular problem will also involve figuring out whether the grapheme is halfwidth or fullwidth (which should count as two columns since it takes up two columns visually) rather than merely the number of graphemes.