koka-lang / koka

Koka language compiler and interpreter
http://koka-lang.org
Other
3.16k stars 151 forks source link

std/text/unicode width does not return column-width 2 for emojis #457

Open erf opened 5 months ago

erf commented 5 months ago

I did expect the width function to return 2 for emojis when using the EastAsianWidth.txt file.

  println(width("👾"))

this returns 1

Is this method supposed to work similar to the display_width method of ziglyph or similar to this Python wcwidth spesification ? That is to give the rendered column width for modern terminal emulators using the latest Unicode standard?

TimWhiting commented 5 months ago

This is what the standard library documentation says:

// Return the column-width of a unicode character.
// Equivalent to ``wcwidth``
pub fun char/width( c : char ) : int {
  if (zero-widths.force.contains(c.int)) then 0
  elif (asian-wide.force.contains(c.int)) then 2
  else 1
}

// Return the total column-width of a string.
pub fun string/width( s : string ) : int {
  var total := 0
  s.foreach( fn(c) {
    total := total + c.width
  })
  total
}

So yes, I believe the intent is for terminal emulators as in the python wcwidth spec, however I'm not certain if it is currently up to date (I'm not sure when Daan last updated the asian-wide list).

Also I would expect the following to print two utf16 characters, but it only does one utf32 character. I guess I'm less certain on the intended underlying representation for characters. I'll have to ask Daan. "👾".slice.foreach(fn(c) c.println)

erf commented 5 months ago

I'll just link this article here. It's a good read with some valuable links

https://mitchellh.com/writing/grapheme-clusters-in-terminals

TimWhiting commented 5 months ago

Thanks for the link!

TimWhiting commented 5 months ago

Great post. I'll have to look at the algorithm he references to improve Koka's clustering

erf commented 5 months ago

Yeah i'm a beta tester on the Ghostty terminal (it's great!), and they have implemented Mode 2027 for proper Unicode handling