maralorn / nix-output-monitor

Pipe your nix-build output through the nix-output-monitor a.k.a nom to get additional information while building.
GNU Affero General Public License v3.0
835 stars 24 forks source link

Fix table alignment #121

Closed 9999years closed 9 months ago

9999years commented 10 months ago

This is at least a partial fix for #78. It works on my setup (macOS, iTerm2, tmux) but I'm curious if it performs well on kitty and others.

Before:

Screenshot of nom showing a misaligned table

After:

Screenshot of nom showing a correctly aligned table

If you'd like to use this patch yourself, here's a nixpkgs overlay to apply it:

final: prev: {
  nix-output-monitor = prev.nix-output-monitor.overrideAttrs (old: {
    patches =
      (old.patches or [])
      ++ [
        (final.fetchpatch {
          url = "https://github.com/maralorn/nix-output-monitor/pull/121.diff";
          hash = "sha256-zASfg0Kp+zKhaJnnlGJ32HcX2K/NrBWAmvPVMm7/jCw=";
        })
      ];
  });
}
P1n3appl3 commented 10 months ago

omg thank you, this has been bugging me for months!

I just grabbed your overlay and tested on kitty (on x86_64-linux) and it appears to be working.

edit: well at least the alignment is fixed. The fact that I'm seeing the ⏳ emoji is maybe not WAI given that it's using VS15 which should be forcing text display instead of emoji iiuc?

before after
before after
P1n3appl3 commented 10 months ago

I might be confused, but I think VS15/16 actually are supposed to go after the character they affect according to these docs

SuperSandro2000 commented 10 months ago

fyi @999eagle

9999years commented 10 months ago

@P1n3appl3 Oh, thanks for catching that! Reverted that part of the patch.

maralorn commented 10 months ago

I’ll be honest. I have no clue what wcwidth does. I hoped it would magically do the right thing. (TM)

Thank you very much for working on this. This is likely noms most annoying user-facing issue, and I’d love to see it gone. I’d prefer, though, to really understand the underlying issue here, which might take a bit.

So first: How are you testing wcwidth? I somehow thought it was vaguely environment-dependent (which sounds a bit crazy for a pure function). The problem is this: noms table looks completely fine in, e.g., my terminal. That is if the output of wcwidth is a pure function of the character I assume any change to it is likely to break my terminal in favor of other terminals. What we need to understand is: Do other terminals print the characters with different widths or does wcwidth give different outputs in another terminal or both?

maralorn commented 10 months ago

I have done some testing on my system in different terminals. wcwidth seems always to be giving correct results: i.e. on my system the changes you are proposing here do not do anything. I have an alignment issue with kitty because kitty prints the clock with width 1.

9999years commented 10 months ago

So first: How are you testing wcwidth? I somehow thought it was vaguely environment-dependent (which sounds a bit crazy for a pure function).

It was an extremely naïve approach, like this:

widthFold (x, False) c =
  -- `wcwidth` sometimes returns -1.
  -- See: https://github.com/maralorn/nix-output-monitor/issues/78
  let width = wcwidth c
   in seq (unsafePerformIO $ do
       if width /= 1
          then putStrLn $ "width of '" <> [c] <> "' is " <> show width
          else pure ()
       ) (x + max width 1, False)

What we need to understand is: Do other terminals print the characters with different widths or does wcwidth give different outputs in another terminal or both?

Unfortunately, Unicode doesn't really care about terminals, so the closest UNIDATA comes to providing information about the widths of codepoints is in the EastAsianWidth.txt data file. wcwidth just gives a best guess, and its man page isn't super detailed:

The wcwidth() function returns the number of columns needed to represent the wide character c. If c is a printable wide character, the value is at least 0. If c is null wide character (L'\0'), the value is 0. Otherwise, -1 is returned.

The man page does note that "the behavior of wcwidth() depends on the LC_CTYPE category of the current locale," but what that implies is anyone's guess.

Here's the source for the macOS wcwidth implementation. (I seriously hope this is an up-to-date link, but it seems to line up with what I've observed.) https://opensource.apple.com/source/gdb/gdb-365/src/readline/support/wcwidth.c.auto.html

Anyways, a bunch of newer terminals have some support for emoji. Usually this means displaying emoji characters as two cells, but sometimes it means displaying them as one cell. Adding to the difficulty, nobody can really agree on which characters are emoji. (Historically Samsung has shipped phones that render a lot of non-emoji characters as emoji.) These days we do have emoji-data.txt, which we can see does specify that U+23F3 HOURGLASS NOT DONE (⏳) is an emoji:

$ curl https://unicode.org/Public/UNIDATA/emoji/emoji-data.txt | rg -i 23f3
23F3          ; Emoji                # E0.6   [1] (⏳)       hourglass not done
23F3          ; Emoji_Presentation   # E0.6   [1] (⏳)       hourglass not done
23F3          ; Extended_Pictographic# E0.6   [1] (⏳)       hourglass not done

This all to say: every terminal will probably be a little bit different. Perhaps the best way to solve this is to have some "emoji-are-double-width" flag or environment variable that users can set.

maralorn commented 10 months ago

Yeah, thank you for looking into this. I have also been doing research these last hours.

So we have two problems:

  1. wcwidth only works correctly in some environments but not in others. This is the bigger and more annoying issue but also the easier one because we just need to find a working implementation of wcwidth. In the worst case, by writing our own. I think that is basically the scope of this PR.
  2. Terminals cannot agree on the width of emoji characters. This is hard to solve because different behaving terminals means in theory that we need to change the behaviour depending on the TERM variable. I have annoyingly not found a unicode character search online which directly shows whether a character is an emoji or not, but I think this only affects the hourglas. I am seriously considering dodging this issue by not using any characters which count as emojis.
maralorn commented 10 months ago

Re 2. To be more precise: Most terminals will print characters which are listed in https://www.unicode.org/Public/14.0.0/ucd/emoji/emoji-data.txt as Emoji_Presentation with width 2 and other normal characters with width 1. They do not change this behavior when using the presentation switch code, which nom does. (Edit: Correction: At least foot does actually print characters which normally have width 1 with width 2 when emoji representation gets activated. Apparently deactivating emoji presentation does not change the width but activating does.) kitty and other supposedly law-abiding terminals switch the width depending when using representation codes.

The consequence for us is: Only use emojis in their default presentation.

Edit: My conclusion regarding emoji: I will replace ⏳ 0x23f3 "hourglass not done, with ⏸ 0x23f8 "pause". That way nom will not use any emoji in emoji presentation. Maybe later we can add alternative emojis for the various symbols adapt foldWidth accordingly and then offer a config option to use "emoji mode" which means even more colorful output in terminals which support it.

maralorn commented 10 months ago

@9999years Can you give an example for a character for which wcwidth returns -1 on your system?

I think actually at this point we can just drop wcwidth completely. I introduced this solution because I didn’t know how emojis and character widths work. But if we avoid double-width emojis and don’t use zero-width modifiers I think we can just assume that every character has width one. (In theory weird logs might hurt that assumption, but that won’t affect the rendering of the table.)

9999years commented 10 months ago

Yeah, it's in the original PR description:

wcwidth thinks that , , and are all -1 cells wide

999eagle commented 10 months ago

I've tested this patch now and at least in kitty it works. It breaks things on qmlkonsole though. (EDIT: Actually it's broken in both now with this patch. Had to restart kitty to see the issue, most likely because of my font settings.) The biggest issue here is what I've already outlined in https://github.com/maralorn/nix-output-monitor/issues/78#issuecomment-1628828943 and what has been discussed in the linked kitty issue: wcwidth in general can never be correct as characters like VS15 and VS16 may change the width of the previous character (depending on how the actual terminal emulator renders them...), so determining widths character by character like wcwidth is being used is not correct. Only the width of an entire string can be determined (somewhat) correctly.

Personally, while I think kitty has the correct approach of rendering U+23F3 U+FE0E (that is Hourglass + VS15) as just one cell wide and qmlkonsole is doing the wrong thing by rendering the same sequence two cells wide, I'd say the best option to avoid the issues with character/string width and special casing the terminal emulator in use is to do what @maralorn suggested anyway and not use VS15/VS16 at all and only use characters in their default presentation.

maralorn commented 10 months ago

Yeah, it's in the original PR description:

Oh, sorry.

Okay, I think I learned, what I needed to learn. Let’s switch away from the hourglass and the VS* selectors, rip out the wcwidth dependency, and simply replace wcwidth with the constant 1 function.

By default, I would just implement this myself, but I will leave it to you, @9999years, to adapt this PR accordingly if you want to. Either way, this PR and this discussion have been very helpful.

maralorn commented 10 months ago

@999eagle (btw. what’s up with you people and all the 9s?)

I have just tried out qmlkonsole and nom looks utterly broken in it. Apparently, the cursor movement control codes do not work as intended or something? Do you observe the same? It was so broken that I could not even inspect anything about hourglasses.

999eagle commented 10 months ago

Yes, qmlkonsole is utterly broken for me as well but it's the terminal emulator I use on my phone. At least the table in nom's output is somewhat reasonably formatted there.

maralorn commented 10 months ago

To clarify, @9999years: I would gladly and quickly implement the changes I outlined in the discussion. I just didn’t want to steal your thunder, and you are welcome to adapt the PR if you want to, just let me know how you would like to proceed.

9999years commented 9 months ago

Updated the patchset but I haven't tested it yet. Having a little trouble with the build, for whatever reason:

Resolving dependencies...
Error: cabal: Could not resolve dependencies:
[__0] trying: nix-output-monitor-2.0.0.7 (user goal)
[__1] trying: streamly-core-0.1.0 (dependency of nix-output-monitor)
[__2] next goal: template-haskell (dependency of streamly-core)
[__2] rejecting: template-haskell-2.20.0.0/installed-2.20.0.0 (conflict:
streamly-core => template-haskell>=2.14 && <2.20)

EDIT: Got it, needed a combination of --allow-newer and removing an orphan instance. Maybe just GHC 9.6 compat.

EDIT: Tested, looks good.

image
maralorn commented 9 months ago

Excellent!

Just 2 tiny things:

  1. Would you mind keeping around the comments with hex codes of the symbols? Doesn’t need to use the showCode eval lines. Also you can leave those comments out where you don’t have the hexCodes available. I am sure I will bikeshed those icons more in the future and quickly being able to look them up will be helpful.

  2. I get one warning for the withFold function:

nix-output-monitor> lib/NOM/Print/Table.hs:77:22: warning: [-Wunused-matches (in -Wextra)]
nix-output-monitor>     Defined but not used: ‘c’
nix-output-monitor>    |
nix-output-monitor> 77 | widthFold (x, False) c = (x + 1, False)

I will merge when we have that sorted out.

Anyone else in this thread is welcome to test this, but I am quite confident of this change and will shortly cut a release with this.

9999years commented 9 months ago

Done. The formatter wanted to add more space once I added the comments (I guess the old ones weren't Haddocks), hope that's OK.

maralorn commented 9 months ago

Thank you again for working on this.

I have a few other improvements which I am trying to bundle into the next release. Annoyingly I am experiencing some test errors on main and want to fix them before I cut a new release.