Closed sorawee closed 2 years ago
I see the expected alignment:
Note that it's shown with a strikeout face, which is the default for racket-xp-unused-face
.
In your screenshot, it's a red wavy underline. This makes me think of flycheck-mode
. In your supplied details, I see you have flycheck-mode
enabled as well as racket-xp-mode
.
This makes me wonder if you're reporting behavior from flycheck and/or its raco
wrapper. However AFAICT that doesn't report unused definitions; at least I can't make it do so without trying too hard.
Maybe you just customized racket-xp-unused-face
to be a red wavy underline, and flycheck is a red herring. But. What happens if you try this after M-x flycheck-mode
to disable in the buffer, then make some edit to cause a refresh of racket-xp-mode
check-syntax?
Sorry for the confusion. I have
(set-face-attribute 'racket-xp-unused-face nil
:strike-through nil
:underline '(:color "orange" :style wave))
in my config file.
My speculation is that this might be due to the Racket version difference. My Racket is essentially at the HEAD, which has the grapheme clustering change recently added.
EDITED: I confirm that when I set my PATH to Racket 8.5 (and regenerate env in Emacs), the position is correct. (I haven't tried 8.6, but it does already suggest that this is related to Racket version)
@rfindler said he would be making changes, but my understanding was that it would be only up in the DrRacket application, not down at the level of the positions reported by drracket-tool-text-lib
(as used by Racket Mode's back end).
A quick scan of recent commits in the racket/drracket
repo seems to match my understanding; they seem to be solely for the DrRacket app or tests?
But I guess that must not be the case, something did recently change/break somehow.
It looks like I haven't rebuilt Racket from source since commit 1db3e1829d circa late May. I'll try to find time to do that, confirm it doesn't work for me either with that, see if I can help narrow down what changed/broke, and point it out to Robby.
Yes, what @greghendershott says is right. (There could be a bug of course :). Also, stuff is currently in flux and @mflatt has another approach to handling how things work with editors that will cause other changes and might break/unbreak something here.
But more generally, it should be the case that positions coming from the racket/drracket
to the Emacs mode code should be grapheme based (as the positions in a port are (when port-count-lines!
is turned on). So hopefully that works out on the Emacs side too.
Although maybe there are edge cases of which I've been blissfully ignorant, historically everything already has "just worked" between drracket/check-syntax
and Emacs. Assuming port-count-lines!
is used, the syntax-position
and -span
values reflected by check-syntax correspond to what Emacs thinks are character ("grapheme"?) positions. They even both agree that positions are numbered from 1 not 0. :smile:
So as we discussed I'm definitely a fan of "don't change that". I'm glad that's still the intent. Of course things might be flux on the main branches and there might be bugs, which seems to be the case here (?).
I'll see what I can figure out. Thanks for the links; although it seems like drracket
and racket/gui
would be "off the path" and N/A for dracket-tool-text-lib
<=> Emacs, I'll take a look anyway to hopefully better understand the motivation for the overall changes.
I'll see what I can figure out. Thanks for the links; although it seems like
drracket
andracket/gui
would be "off the path" and N/A fordracket-tool-text-lib
<=> Emacs, I'll take a look anyway to hopefully better understand the motivation for the overall changes.
Assuming that Emacs really does do graphemes = positions then I think nothing should have to change for you but it is fantastic that you're offering to help debug. (But that code can certainly break stuff if it is wrong :)
@sorawee Unsurprisingly I can reproduce this using Racket built from commit 0cd6f5631e plus the versions of drracket etc. that it pulls in.
Continuing to be stumped by the lack of apparent change in drracket-tool-text-lib
, I had a crazy idea that it might be an even lower level change.
TL;DR it seems that syntax-span
now returns a different value.
Consider this little program to exercise read-syntax
:
#lang racket/base
(require racket/path
syntax/modread
racket/match)
(define (string->syntax code-str)
(define path (build-path (current-directory) "foo.rkt"))
(define dir (path-only path))
(parameterize ([current-load-relative-directory dir]
[current-directory dir])
(with-module-reading-parameterization
(λ ()
(define in (open-input-string code-str path))
(port-count-lines! in)
(match (read-syntax path in)
[(? eof-object?) #'""]
[stx stx])))))
(define stx (string->syntax "\"☠️\""))
stx
(syntax-span stx)
When run by older Rackets (I tried 8.0 and 8.4 BC because I happened to have them handy):
#<syntax:foo.rkt:1:0 "\u200D☠️">
5
But run by Racket built from source as of today:
#<syntax:foo.rkt:1:0 "\u200D☠️">
3
In other words they both read the same string (which looks weird to my eyes but I'm a Unicode noob).
However syntax-span
for read-syntax
of "\"☠️\""
used to be 5 in older versions of Racket --- but is now 3.
Although this program doesn't exercise/show it, I'm guessing this also throws of the syntax-position
of subsequent pieces of read syntax? (And (waves hands) leads to the behavior seen via check-syntax.)
This seems like a change down in read-syntax
, so maybe @mflatt would know?
Yes, this is a (n intentional) change at the port-counting layer. Ports used to count in terms of unicode code points, but now it counts in terms of graphemes (assuming you call port-count-lines!
-- otherwise it is all bytes all the time). This program:
#lang racket
(define sp (open-input-string "☠️"))
(port-count-lines! sp)
(void (read-line sp))
(port-next-location sp)
produces 1, 2, 3 in older versions of racket and 1, 1, 2 in git head.
This change then gets into the syntax objects, and the spans and whatnot that you're seeing in the example above. And this is how we get grapheme-based counting in the new racket, as opposed to code-point-based counting of the older versions.
If there aren't any fancy emojis (or maybe other things) then there isn't a difference, but if there are, then hopefully life should be an improvement due to this change. And it means that, in Emacs, you'll want to use methods that accept positions in the buffer to be counting based on graphemes, not based on other things. I took a quick look at Emacs and it wasn't obvious to me if this is easy or hard ...
Hmm, could there be a way (or a parameter, or something) to convert back to codepoints? My linter/autograder scripts produces outputs in terms of syntax object spans, which are then fed to Codemirror to highlight, and Codemirror expects to work in terms of codepoints. I can't prevail on Codemirror to change its habits, and it already has some quirks in how it handles graphemes. But I might have a hope here... :)
(Also -- is this change in HEAD only, or in 8.6 as well?)
@blerner maybe we should open a separate issue for that one? I think the change has not yet been in a release.
Feel free to open it -- I'm not sure which repo would be best for it ;-) In the meantime, I'm relieved to hear it's not in 8.6 yet, so I can punt on dealing with it until next year :)
A parameter could work. I had earlier considered adding an option in port-count-lines!
, but propagating that choice through the make-input-port
and make-output-port
APIs seemed too painful to be worthwhile. In retrospect, a parameter is a good choice (and maybe I didn't think of it just because my reflex is to avoid parameters).
The parameter I have in mind would affect only built-in line-counting and any port implementation that opts to pay attention to the parameters. It would apply when port-count-lines!
is called, and the setting would stick to a port at the point where line counting is enabled. Probably it would make sense to have an environment variable that can initialize the parameter on startup, but the default still would be grapheme-based. I think DrRacket and text%
generally would not try to adapt to the parameter, though, so it wouldn't make any more sense to change the parameter in DrRacket than to change the text encoding.
The scenario I have is
(with-handlers ([exn:fail? (λ (x) #f)])
... (parameterize ([read-accept-reader #t]
[read-accept-lang #t])
(call-with-input-file source
(λ(port)
(port-count-lines! port)
...
(define stx-first (read-syntax source port))
... work with stx-first ...))))
so as long as the port from call-with-input-file
recognizes and accommodates this parameter, I should be fine. Does that match your thinking?
Does that match your thinking?
Yes, the parameter would apply in that case.
Probably it would make sense to have an environment variable that can initialize the parameter on startup, but the default still would be grapheme-based. I think DrRacket and
text%
generally would not try to adapt to the parameter, though, so it wouldn't make any more sense to change the parameter in DrRacket than to change the text encoding.
Would setting the environment variable potentially break programs running in DrRacket (or DrRacket itself)?
Would setting the environment variable potentially break programs running in DrRacket (or DrRacket itself)?
It could cause source locations to not be in sync with editor content, but I think that's no different than a #lang
that creates weird source locations.
About it being intentional: Yes, after posting, I had taken a break, come back, and discovered https://github.com/racket/racket/commit/48fda3e093affaa1b58b2123fda39c99c36c5e6f and adjacent commits.
The more I thought about that, the more discouraged I was feeling, wondering how even to translate from graphemes back to codepoints, which I'm pretty sure Emacs needs (and it looks like some other tools need, too).
It would be a huge relief to have a parameter to support the previous, codepoint-based positions. I could try to dynamic-require
that, and if it exists because it's a newer Racket, I could set it. All versions of Racket thereby supported smoothly.
Maybe there should be a parameter that affects only ports opened as part of require
(clearly something in a longer-named function, maybe current-load/use-compiled
or something) instead of all ports in some dynamic extent?
Maybe I don't have the right example in mind, but in an environment where non-grapheme counting is needed, it seems you probably want it pervasively.
I've added the parameter as port-count-graphemes-enabled
with environment variable PLT_PORT_COUNT_CHARS
, and we can see how those work it out. I also added use-char-as-grapheme
as part of the text%
changes.
My thought was a situation where someone wants to work with their program in Emacs but their program also opens and manipulates grapheme-containing files. So they do the Emacs equivalent of the Run button in DrRacket. If there is an error, we want the error message to come out in code-point-based positions, but if there is no error, we want the file that they open and manipulate to report positions in graphemes. (One example of this might be running DrRacket from inside Emacs, although that doesn't seem super common :)
I just tried an experiment and its results suggest that Emacs wants grapheme-based counting. This is a very naive experiment, however :). I created a file with these bytes: #"a\342\230\240b\357\270\217\n"
(that's an "a" followed by "☠️" followed by "c") and then opened it in Emacs. I then did M-: an evaluated (goto-char 1)
, (goto-char 2)
and (goto-char 3)
, which sent the insertion point to just before the "a", just before the "☠️", and just before the "c".
Is racket-mode using goto-char and related operations that accept positions like its to move the insertion point around and highlight things?
@mflatt Thanks!! Using that parameter fixes this problem, from a quick test. I do want to marinade in this a bit before merging.
@rfindler Racket Mode uses goto-char
etc. Also it uses Emacs "text properties", which are arbitrary key/values that can be associated with position spans, which is the mechanism being used here.
I'm not super confident in my understanding of all the intricacies of Unicode, and characters vs. codepoints vs. graphemes vs. glphys. I want to keep believing many falsehoods about text as long as possible. :smile: I doubt I can explain your experiment in a way where you'd give me a passing grade, much less an A. :smile:
But I think part of what's going on here is that @sorawee's original example "☠️"
is a grapheme cluster or combination (not sure the exact terminology)?
In an Emacs buffer it actually results in three navigable positions within the quotation marks. The skull and crossbones per se is sandwiched between two zero-width characters. With point at each one of the three, you can C-u C-x = to get a buffer showing details. These are:
1.
position: 45 of 66 (67%), column: 9
character: (displayed as ) (codepoint 8205, #o20015, #x200d)
preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x200D
script: symbol
syntax: . which means: punctuation
to input: type "C-x 8 RET 200d" or "C-x 8 RET ZERO WIDTH JOINER"
buffer code: #xE2 #x80 #x8D
file code: #xE2 #x80 #x8D (encoded by coding system utf-8-unix)
display: by this font (glyph code)
xft:-HL -Khmer OS-normal-normal-normal-*-20-*-*-*-*-0-iso10646-1 (#x2D1)
Character code properties: customize what to show
name: ZERO WIDTH JOINER
general-category: Cf (Other, Format)
decomposition: (8205) ('')
There are text properties here:
face font-lock-string-face
fontified t
2.
position: 46 of 66 (68%), column: 9
character: ☠ (displayed as ☠) (codepoint 9760, #o23040, #x2620)
preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0x2620
script: symbol
syntax: w which means: word
category: .:Base
to input: type "C-x 8 RET 2620" or "C-x 8 RET SKULL AND CROSSBONES"
buffer code: #xE2 #x98 #xA0
file code: #xE2 #x98 #xA0 (encoded by coding system utf-8-unix)
display: by this font (glyph code)
xft:-PfEd-DejaVu Sans Mono-normal-normal-normal-*-20-*-*-*-m-0-iso10646-1 (#xA77)
Character code properties: customize what to show
name: SKULL AND CROSSBONES
general-category: So (Symbol, Other)
decomposition: (9760) ('☠')
There are text properties here:
face font-lock-string-face
fontified t
3.
position: 47 of 66 (70%), column: 10
character: ️ (displayed as ️) (codepoint 65039, #o177017, #xfe0f)
preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0xFE0F
syntax: w which means: word
category: ^:Combining
to input: type "C-x 8 RET fe0f" or "C-x 8 RET VARIATION SELECTOR-16"
buffer code: #xEF #xB8 #x8F
file code: #xEF #xB8 #x8F (encoded by coding system utf-8-unix)
display: by this font (glyph code)
xft:-SIL -Abyssinica SIL-normal-normal-normal-*-20-*-*-*-*-0-iso10646-1 (#x3AE)
Character code properties: customize what to show
name: VARIATION SELECTOR-16
general-category: Mn (Mark, Nonspacing)
decomposition: (65039) ('️')
There are text properties here:
face font-lock-string-face
fontified t
And this representation in Emacs has aligned (AFAIK) with Racket's codepoint-based port counting.
Thanks, @greghendershott ! I also definitely do not know what's going on, so instead of using @sorawee 's example, I went back to the pirate flag example that I've been using in the framework and drr test suites. Here's what I see: if you put these bytes into a file:
#"a\360\237\217\264\342\200\215\342\230\240\357\270\217b\n"
then you should get a file that has four places you can navigate to on the first line (before the "a", before the pirate flag, before the "b" and after the "b"). The file has three grapheme thingies in it's first line, but it has more unicode code points than that (I think the pirate flag is five unicode code points? I'm not sure).
So: here's a program to demonstrate that:
#lang racket
(define bts #"a\360\237\217\264\342\200\215\342\230\240\357\270\217b\n")
(define p (open-input-bytes bts))
(define-values (_1 _2 start-pos) (port-next-location p))
(port-count-lines! p)
(read-line p)
(define-values (_3 _4 end-pos) (port-next-location p))
end-pos
In git-head racket, I get 5 back which makes sense to me. We start at position 1 (that's what start-pos
is) then get the "a", the flag, the "b", and then the newline, for a total of 4 more spots so end-pos comes out to be 5. This also matches what I see in emacs v28.1 (which is one that this blog post reports as the one where grapheme support got added).
In racket v8.6, with the code above, I see 8, and that is about what the codepoint number seems to be. Furthermore in Emacs v27.1, I also see a similar kind of behavior, where I have some empty space surrounding the pirate flag and the flag looks wrong (I'm attaching the two emacs screenshots both opening the same file.)
So: could I dare hope that racket-mode could check to see if someone is using racket v8.7 and Emacs 28.1 and then not set the world-killing flag [*] that Matthew just added? And if someone isn't using this combination and reports grapheme problems, we ask them to upgrade? Does that seem remotely plausible?
[*] the one that disables grapheme counting for port-count-lines!
that I wish we wouldn't have to support in the future .... but I'm probably just being lazy with this thought :)
Using this program to create a file with your bytes:
#lang racket/base
(define rfindler-bytes #"a\360\237\217\264\342\200\215\342\230\240\357\270\217b\n")
(bytes->string/utf-8 rfindler-bytes)
(call-with-output-file* "issue-638-3.rkt" #:mode 'binary #:exists 'replace
(λ (out) (display rfindler-bytes out)))
I open it in the oldest Emacs supported by Racket Mode, as well as the newest (Emacs built from source recently).
In both cases, I don't get the pirate flag because I lack a font supplying that on my Ubuntu system.
In both cases, I get the same buffer contents.
Note that the letter "b" is at position 6 of 7 in both cases -- that is, Emacs didn't change the position units from codepoints to graphemes.
Keep in mind that in Emacs, there is not necessarily a 1:1 correspondence between characters in the buffer and columns on the screen:
There is a 'display
text property that can cause any span of buffer text to be displayed as any other text of any size. This doesn't change the positions of the characters in the buffer.
Furthermore, IIUC some composition may automatically be done for speed down at the display engine level. (But again that doesn't impact the buffer position units: In my previous example above notice that positions 45, 46, 47 are all reported as "column 9".)
Thank you @greghendershott ! I see the same thing as you when I put the insertion point on the "b" and do C-u C-x =.
My confusion stemmed from the assumption that the right-arrow key would move forward only one position, but that doesn't seem to be the case when I type it interactively. That is, when I type the right arrow key three times, I move from the "a" to the flag to the "b". My version 28.1 emacs reports that "\<right> runs the command right-char" (via C-h c) but, unfortunately, when I do M-: (right-char)
three times from the start of the buffer I end up in the middle of the flag, as you and the position=codepoint understanding would have predicted. Similarly, goto-char sends me to the "b" only when I pass 6 to it and not any smaller numbers.
I see what you mean about the display property, too. When I do C-u C-x = on the pirate flag, I get this response:
position: 2 of 7 (14%), column: 1
character: 🏴 (displayed as 🏴) (codepoint 127988, #o371764, #x1f3f4)
charset: unicode (Unicode (ISO10646))
code point in charset: 0x1F3F4
script: emoji
syntax: w which means: word
category: .:Base
to input: type "C-x 8 RET 1f3f4" or "C-x 8 RET WAVING BLACK FLAG"
buffer code: #xF0 #x9F #x8F #xB4
file code: #xF0 #x9F #x8F #xB4 (encoded by coding system utf-8-unix)
display: composed to form "🏴☠️" (see below)
Composed with the following character(s) "☠️" using this font:
mac-ct:-*-Apple Color Emoji-normal-normal-normal-*-12-*-*-*-p-0-iso10646-1
by these glyphs:
[0 3 0 912 16 -1 16 12 3 nil]
with these character(s):
(#x200d) ZERO WIDTH JOINER
☠ (#x2620) SKULL AND CROSSBONES
️ (#xfe0f) VARIATION SELECTOR-16
Character code properties: customize what to show
name: WAVING BLACK FLAG
general-category: So (Symbol, Other)
decomposition: (127988) ('🏴')
There are text properties here:
fontified t
Overall, I am left wondering if there is a way to ask the buffer to go to the nth grapheme instead of the nth position, or to somehow convert between grapheme counts and position counts. FWIW, that was the state we were in before Matthew's most recent big change -- positions in a text% were still in terms of code points but there were conversion functions you could call (position-grapheme
and grapheme-position
which are still there but are useful in different situations now).
Perhaps there is a way to coax Emacs to take these display properties into account when using positions? Functions like the text%
methods position-grapheme
and grapheme-position
but using Emacs's 'display
property? I tried using object-intervals
but the display property doesn't show up there. I was trying to use these docs, which are pointed to from this explanation of the 'display property doing things like this: (get-text-property 3 'display (find-buffer-visiting "issue-638-3.rkt"))
also with variations that have numbers other than 3 in there and having no luck.
It is also kind of a mystery why typing <right>
works differently than calling the function bound to <right>
.
Here's another idea we could try: we could add something to the drracket/syncheck
library so that you could ask for the positions that come back from it to be in code-point numbers instead of in grapheme numbers. It could call the text%
position-grapheme
functions on the numbers before returning them. This would introduce a dependency on racket/gui/base if done in a simple way, but I guess there is a more complex way where we take the code that's been added to text%
to build the position<->grapheme mapping and reuse it. What do you think of that approach?
Selfish observation: So far this whole change (i.e. not just using the parameter to disable the change) seems like a solution to a problem I don't have.
In fact it feels like it would (if I didn't use the parameter) create new problems I would need to solve just to keep things working the same as before. To avoid breaking, as opposed to improving.
General observations:
I'm kind of unclear on the motivation wrt to text%
and DrRacket, i.e. what problems this change is trying to solve.
I think (waves hands) it is about having compositions like pirate flags be counted as 1 thing not multiple things. To move up to a higher level of abstraction.
I do wonder about the case where someone (e.g. me in reality, above) lacks a font with a glyph for the pirate flag. Emacs displays the two components, a don't-have-it box followed by the skull+crossbones glpyh. What would DrRacket do? If it needs to show 2 items, what happens to grapheme counting and that nice 1:1 correspondence?
Similarly, I get the impression that manual (de)composition is part of how people sometimes handle some languages (e.g. diacritics). So sometimes the composed "single" thing on the screen can be split into 2 or more, at least temporarily. What if someone is building a text%
app and for some reason needs port counting to correspond to that? (Maybe this is unlikely beyond DrRacket. Maybe it's N/A because at any given moment that something like a check-syntax would run, the composed-or-not state would be consistent.)
Idea: Maybe Emacs works this way not just because it's old and hairy and too concerned with backward compatibility (although Emacs definitely can be all those things! :smile:). Maybe it really does make sense for there to exist a layer where positions are non-composed, and then also a display layer where they might be composed (when possible)? To be clear I'm not arguing this for sure -- sometimes Emacs is just "bad old" not "good old", and anyway I don't know the character encoding and display representation space well enough to have a valuable opinion. I'm throwing it out just as a question.
Your observations and questions are all good ones and I don't have answers for them. Maybe someone wiser or more knowledgeable than me will be able to offer some insight.
From where I sit, the problem to be solved is that changing the parameter can break programs. That is, I believe that some racket programs will be written on the assumption that port-count-lines!
does one specific thing (whether this is implicit by looking at the spans of its own syntax objects (in a test suite say) or explicit by working with ports directly). When that program is run in racket-mode it will do a different thing than when the program is run in drracket (or command-line racket). That is the situation I'm trying to avoid -- although I do agree that there will not be many programs like that and perhaps I am wasting my time! Especially because this situation comes up only for folks using grapheme clusters.
One way to avoid that problem is for the drracket/syncheck
library to offer a mode where it reports positions in a way that matches what racket-mode wants. I do realize (now), tho, that this introduces more pain for you as you'd need to call into the drracket/syncheck
library in (yet another) version-specific way. Are you using show-content
? Maybe I could make it default to code-point based positions and have an option to turn them into grapheme-based ones? Then you would need to change nothing (I think)?
Ok, it looks like I've mostly just wasted everyone's time here.
The goal was to bring racket source-location reporting up-to-date by paying attention to graphemes, particularly in the context of Rhombus where column counting affects parsing. But, in fact, existing editors that handle graphemes don't do that. I didn't pay enough attention to the column information reported in other editors that I tried (VSCode, Xcode, and now Emacs 28.1), where even though a right-arrow key will move past a whole grapheme, the column count increments by the width in Unicode characters. So the change made Racket inconsistent with editors, the opposite of the intent.
It looks like the way forward is
text%
back to always working in character-position modeposition-grapheme
and grapheme-position
(but those are not needed for things like syntax coloring or error highlighting)I just tried sublime text and it also jumps over the grapheme cluster with a single right-arrow keystroke, but increments the column count in a way that matches codepoints by more than one. 😞
EDIT: Sublime, Emacs, and XCode don't agree, however. With the bytes in the file as defined by rfindler-bytes
above, as I move from the start of the file to the right, I see the column counts: 0, 1, 4, 5 in Emacs and 1, 2, 6, 7 in Sublime Text and 1, 2, 7, 8 in XCode. Three different answers?!? Maybe your way is in their future, @mflatt .
@rfindler Thanks for the correction and clarification. So, it turns out that Emacs and Xcode don't actually count by code points, and whatever those do, Racket source locations are just not going to be compatible for a stream that has certain characters. As one extra data point, VSCode behaves like Sublime and counts code points.
I’m still inclined to revert the Racket backward-incompatible changes to counting and text%
, which overall will make DrRacket count the way Sublime and VSCode do when "Count columns numbers from one" is enabled.
In a file with this:
a🏴☠️b
1234
where, when rendered with a duospace font, the b
at the end of the first line aligns with the 4
at the end of the second line, I would expect to see the column counts 0, 1, 3, 4. More than the number of Grapheme clusters, but less than the number of Unicode scalars.
So, it turns out that Emacs and Xcode don't actually count by code points, and whatever those do, Racket source locations are just not going to be compatible for a stream that has certain characters.
Are you sure? What's an example of that?
In all the examples so far: The code-points correspond to Emacs "characters" actually existing in the buffer -- independent of what's displayed (don't rely on that or motion commands, use C-u C-x =
to inspect the real position values). The Emacs position values correspond exactly to the old Racket syntax-position
values.
There is a lot of variety and uncertainty related to how things get displayed:
The code-points might compose into different displayable graphemes that would look like just 1 character in 1 column.
I lack the pirate flag glyph in any installed font, so I get two graphemes in two columns.
forward-char
never, ever skips more than one character for me, although you report it does for you.
Despite all that variety, if there's an error at syntax-position
42 with a syntax-span
of 5, and I tell Emacs to highlight that -- it highlights the correct stuff on the screen. Emacs knows how the underlying positions map onto the displayed presentation.
Maybe you know some counter-example that I don't, but that's why I'm a fan of the old status quo.
In fact, the more I think about it, the more I feel like code-point units are better. They are more precise, avoid varying ideas about "columns" and motion, and avoid the problem that not every user system can display every glyph.
So, it turns out that Emacs and Xcode don't actually count by code points
I should have been more clear that I was talking about "columns" as shown, for example, by column-number-mode
and with the values that Robby reported.
I understand that positions correspond to the internal thing in Emacs, and so finding things by position and span will work. That's great, and we're back to the status quo.
But if a human (or a tool that parses error messages) is trying to navigate to code based on a printed source location in line—column format, then that's the thing that isn't going to always work. Probably not a big deal.
results in:
Notice the "no bound occurrences" underline is in the incorrect position.
Note that DrRacket doesn't have this issue (this is essentially a test in the DrRacket repo, so I think there might be an issue before, and it got fixed)