racket / drracket

DrRacket, IDE for Racket
http://www.racket-lang.org/
Other
454 stars 93 forks source link

Problem with diacritics #478

Open sorawee opened 3 years ago

sorawee commented 3 years ago

DrRacket can't display diacritics in Thai language (and probably other languages with diacritics) correctly in the code editor.

Screen Shot 2021-04-04 at 11 08 52 PM

Here's how it should be displayed:


(กำหนด ความกว้าง 500)

(กำหนด ความกว้าง 500)

FWIW, Emacs is able to display it correctly.

Screen Shot 2021-04-04 at 11 13 36 PM

@mbutterick's quad used to have an issue with diacritics too (though it's a different problem), so let me @ you in case you have an idea what could go wrong.

rfindler commented 3 years ago

I am guessing this is an issue with either text% or perhaps the drawing libraries (accessed via, eg, canvas-dc%), but maybe on a non-mac platform? Or maybe a specific font? (It looks okay to me.)

Here's some code that might reproduce the issue outside of DrRacket (if it isn't a font-specific issue).

#lang racket/gui
(define s "กำหนด ความกว้าง")
(define t (new text%))
(define f (new frame% [label ""][width 300] [height 300]))
(define ec (new editor-canvas% [parent f] [editor t]))
(send t insert s)
(send f show #t)
sorawee commented 3 years ago

Sorry, should have mentioned that I'm on Mac. The program that you provided above does reproduce the issue, though weirdly, "กำ" is now displayed correctly! "กว้าง" is still incorrect however.

Screen Shot 2021-04-05 at 6 18 40 AM

This is not a font specific issue IIUC. Even with the font TH Sarabun New (the standard font for Thai script), the issue persists in DrRacket.

Screen Shot 2021-04-05 at 6 21 04 AM

Here's how it displays in word processor softwares.

Screen Shot 2021-04-05 at 6 21 44 AM
97jaz commented 3 years ago

I think the problem is more generally with unicode combining characters:

#lang racket/base

(define chars '(#\e #\u0301))
(displayln chars)
(displayln (list->string chars))
(newline)

(define precomposed-chars
  ((compose string->list string-normalize-nfc list->string)
   chars))
(displayln precomposed-chars)
(displayln (list->string precomposed-chars))
97jaz commented 3 years ago

Related? https://github.com/racket/draw/issues/22 According to a comment in this issue, DrRacket always uses #f for the combine? parameter to the draw-text method of dc<%>. And the code has this comment: https://github.com/racket/draw/blob/a4e156abe5119309783443495d671b9a7f3e434b/draw-lib/racket/draw/private/dc.rkt#L1493

sorawee commented 2 years ago

In the latest version of DrRacket, things are a bit flipped. Running @rfindler's program, we will get:

Screen Shot 2022-01-11 at 5 35 29 PM

where กำ, which consists of two characters and , is displayed without the circle on top of . Note though that กว้าง is now displayed correctly.

It's somewhat weird, because this display problem only occurs when I choose not to "normalize" when pasting the code in. If I normalized, I do get the desired display, but now กำ becomes 3 characters: , , and , which is incorrect in Thai language. is one character, and is not equivalent to + .

Screen Shot 2022-01-11 at 5 40 17 PM
sorawee commented 2 years ago

I want to try this again after the recent unicode change, and just noticed a couple more issues (which already exist even before the unicode change)

Steps to reproduce:

mflatt commented 2 years ago

The problem with (ความกว้าง 500) should be fixed by the snip-lib commit.