Closed 9999years closed 9 months ago
omg thank you, this has been bugging me for months!
I just grabbed your overlay and tested on kitty
(on x86_64-linux
) and it appears to be working.
edit: well at least the alignment is fixed. The fact that I'm seeing the ⏳ emoji is maybe not WAI given that it's using VS15 which should be forcing text display instead of emoji iiuc?
before | after |
---|---|
I might be confused, but I think VS15/16 actually are supposed to go after the character they affect according to these docs
fyi @999eagle
@P1n3appl3 Oh, thanks for catching that! Reverted that part of the patch.
I’ll be honest. I have no clue what wcwidth does. I hoped it would magically do the right thing. (TM)
Thank you very much for working on this. This is likely noms most annoying user-facing issue, and I’d love to see it gone. I’d prefer, though, to really understand the underlying issue here, which might take a bit.
So first: How are you testing wcwidth? I somehow thought it was vaguely environment-dependent (which sounds a bit crazy for a pure function). The problem is this: noms table looks completely fine in, e.g., my terminal. That is if the output of wcwidth is a pure function of the character I assume any change to it is likely to break my terminal in favor of other terminals. What we need to understand is: Do other terminals print the characters with different widths or does wcwidth give different outputs in another terminal or both?
I have done some testing on my system in different terminals. wcwidth seems always to be giving correct results: i.e. on my system the changes you are proposing here do not do anything. I have an alignment issue with kitty because kitty prints the clock with width 1.
So first: How are you testing wcwidth? I somehow thought it was vaguely environment-dependent (which sounds a bit crazy for a pure function).
It was an extremely naïve approach, like this:
widthFold (x, False) c =
-- `wcwidth` sometimes returns -1.
-- See: https://github.com/maralorn/nix-output-monitor/issues/78
let width = wcwidth c
in seq (unsafePerformIO $ do
if width /= 1
then putStrLn $ "width of '" <> [c] <> "' is " <> show width
else pure ()
) (x + max width 1, False)
What we need to understand is: Do other terminals print the characters with different widths or does wcwidth give different outputs in another terminal or both?
Unfortunately, Unicode doesn't really care about terminals, so the closest UNIDATA comes to providing information about the widths of codepoints is in the EastAsianWidth.txt
data file. wcwidth
just gives a best guess, and its man page isn't super detailed:
The
wcwidth()
function returns the number of columns needed to represent the wide characterc
. Ifc
is a printable wide character, the value is at least 0. Ifc
is null wide character (L'\0'
), the value is 0. Otherwise, -1 is returned.
The man page does note that "the behavior of wcwidth()
depends on the LC_CTYPE
category of the current locale," but what that implies is anyone's guess.
Here's the source for the macOS wcwidth
implementation. (I seriously hope this is an up-to-date link, but it seems to line up with what I've observed.)
https://opensource.apple.com/source/gdb/gdb-365/src/readline/support/wcwidth.c.auto.html
Anyways, a bunch of newer terminals have some support for emoji. Usually this means displaying emoji characters as two cells, but sometimes it means displaying them as one cell. Adding to the difficulty, nobody can really agree on which characters are emoji. (Historically Samsung has shipped phones that render a lot of non-emoji characters as emoji.) These days we do have emoji-data.txt
, which we can see does specify that U+23F3 HOURGLASS NOT DONE (⏳) is an emoji:
$ curl https://unicode.org/Public/UNIDATA/emoji/emoji-data.txt | rg -i 23f3
23F3 ; Emoji # E0.6 [1] (⏳) hourglass not done
23F3 ; Emoji_Presentation # E0.6 [1] (⏳) hourglass not done
23F3 ; Extended_Pictographic# E0.6 [1] (⏳) hourglass not done
This all to say: every terminal will probably be a little bit different. Perhaps the best way to solve this is to have some "emoji-are-double-width" flag or environment variable that users can set.
Yeah, thank you for looking into this. I have also been doing research these last hours.
So we have two problems:
Re 2. To be more precise: Most terminals will print characters which are listed in https://www.unicode.org/Public/14.0.0/ucd/emoji/emoji-data.txt as Emoji_Presentation with width 2 and other normal characters with width 1. They do not change this behavior when using the presentation switch code, which nom does. (Edit: Correction: At least foot does actually print characters which normally have width 1 with width 2 when emoji representation gets activated. Apparently deactivating emoji presentation does not change the width but activating does.) kitty and other supposedly law-abiding terminals switch the width depending when using representation codes.
The consequence for us is: Only use emojis in their default presentation.
Edit: My conclusion regarding emoji: I will replace ⏳ 0x23f3 "hourglass not done, with ⏸ 0x23f8 "pause". That way nom will not use any emoji in emoji presentation. Maybe later we can add alternative emojis for the various symbols adapt foldWidth accordingly and then offer a config option to use "emoji mode" which means even more colorful output in terminals which support it.
@9999years Can you give an example for a character for which wcwidth returns -1 on your system?
I think actually at this point we can just drop wcwidth completely. I introduced this solution because I didn’t know how emojis and character widths work. But if we avoid double-width emojis and don’t use zero-width modifiers I think we can just assume that every character has width one. (In theory weird logs might hurt that assumption, but that won’t affect the rendering of the table.)
Yeah, it's in the original PR description:
wcwidth
thinks that⏱
,⏳
, and⏵
are all -1 cells wide
I've tested this patch now and at least in kitty
it works. It breaks things on qmlkonsole
though. (EDIT: Actually it's broken in both now with this patch. Had to restart kitty
to see the issue, most likely because of my font settings.) The biggest issue here is what I've already outlined in https://github.com/maralorn/nix-output-monitor/issues/78#issuecomment-1628828943 and what has been discussed in the linked kitty issue: wcwidth
in general can never be correct as characters like VS15 and VS16 may change the width of the previous character (depending on how the actual terminal emulator renders them...), so determining widths character by character like wcwidth
is being used is not correct. Only the width of an entire string can be determined (somewhat) correctly.
Personally, while I think kitty
has the correct approach of rendering U+23F3 U+FE0E
(that is Hourglass + VS15) as just one cell wide and qmlkonsole
is doing the wrong thing by rendering the same sequence two cells wide, I'd say the best option to avoid the issues with character/string width and special casing the terminal emulator in use is to do what @maralorn suggested anyway and not use VS15/VS16 at all and only use characters in their default presentation.
Yeah, it's in the original PR description:
Oh, sorry.
Okay, I think I learned, what I needed to learn. Let’s switch away from the hourglass and the VS* selectors, rip out the wcwidth dependency, and simply replace wcwidth with the constant 1 function.
By default, I would just implement this myself, but I will leave it to you, @9999years, to adapt this PR accordingly if you want to. Either way, this PR and this discussion have been very helpful.
@999eagle (btw. what’s up with you people and all the 9s?)
I have just tried out qmlkonsole and nom looks utterly broken in it. Apparently, the cursor movement control codes do not work as intended or something? Do you observe the same? It was so broken that I could not even inspect anything about hourglasses.
Yes, qmlkonsole is utterly broken for me as well but it's the terminal emulator I use on my phone. At least the table in nom's output is somewhat reasonably formatted there.
To clarify, @9999years: I would gladly and quickly implement the changes I outlined in the discussion. I just didn’t want to steal your thunder, and you are welcome to adapt the PR if you want to, just let me know how you would like to proceed.
Updated the patchset but I haven't tested it yet. Having a little trouble with the build, for whatever reason:
Resolving dependencies...
Error: cabal: Could not resolve dependencies:
[__0] trying: nix-output-monitor-2.0.0.7 (user goal)
[__1] trying: streamly-core-0.1.0 (dependency of nix-output-monitor)
[__2] next goal: template-haskell (dependency of streamly-core)
[__2] rejecting: template-haskell-2.20.0.0/installed-2.20.0.0 (conflict:
streamly-core => template-haskell>=2.14 && <2.20)
EDIT: Got it, needed a combination of --allow-newer
and removing an orphan instance. Maybe just GHC 9.6 compat.
EDIT: Tested, looks good.
Excellent!
Just 2 tiny things:
Would you mind keeping around the comments with hex codes of the symbols? Doesn’t need to use the showCode eval lines. Also you can leave those comments out where you don’t have the hexCodes available. I am sure I will bikeshed those icons more in the future and quickly being able to look them up will be helpful.
I get one warning for the withFold function:
nix-output-monitor> lib/NOM/Print/Table.hs:77:22: warning: [-Wunused-matches (in -Wextra)]
nix-output-monitor> Defined but not used: ‘c’
nix-output-monitor> |
nix-output-monitor> 77 | widthFold (x, False) c = (x + 1, False)
I will merge when we have that sorted out.
Anyone else in this thread is welcome to test this, but I am quite confident of this change and will shortly cut a release with this.
Done. The formatter wanted to add more space once I added the comments (I guess the old ones weren't Haddocks), hope that's OK.
Thank you again for working on this.
I have a few other improvements which I am trying to bundle into the next release. Annoyingly I am experiencing some test errors on main and want to fix them before I cut a new release.
This is at least a partial fix for #78. It works on my setup (macOS, iTerm2, tmux) but I'm curious if it performs well on
kitty
and others.wcwidth
thinks that⏱
,⏳
, and⏵
are all -1 cells wide)wcwidth
thinks they're both 0 cells wide)VS15 and VS16 go before the codepoint they apply to, not afterI had this backwards, thanks @P1n3appl3 for pointing me to the docs.wcwidth
thinks it's 1 cell wide)Before:
After:
If you'd like to use this patch yourself, here's a nixpkgs overlay to apply it: