microsoft / WSL

Issues found on WSL
https://docs.microsoft.com/windows/wsl
MIT License
17.25k stars 812 forks source link

UTF-8 rendering woes #75

Closed tycho closed 6 years ago

tycho commented 8 years ago

Examples below use the UTF-8 demo file.

Some of the rendering issues could be attributed to the font (Consolas), but some cannot.

Here's Consolas with MinTTY (Cygwin): Consolas on MinTTY

And here's Consolas with "Bash on Windows": Consolas on Bash

Consolas simply doesn't do well on the box drawing tests.

One of the best monospace fonts I've found is DejaVu Sans Mono. But cmd.exe's properties page doesn't allow me to select that font when it's installed. It has a static list of fonts that appear in the Windows Registry. In order to use fonts other than Lucida Console, Consolas, or raster fonts, I need to replace one of the fonts listed in the registry under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont. In my case, I replaced Consolas with DejaVu Sans Mono for another test:

DejaVu Sans Mono with MinTTY (Cygwin): DejaVu Sans Mono on MinTTY

DejaVu Sans Mono with "Bash on Windows": DejaVu Sans Mono on Bash

Now the box drawing tests are fine, but there are numerous UTF-8 glyphs that are unavailable for use.

So the problems are:

mobluse commented 8 years ago

Thanks, I also added DejaVu Sans Mono to cmd.exe and Cygwin64 Terminal. Now also links http://www.fileformat.info/info/unicode/block/arrows/utf8test.htm works with UTF-8 -- press Esc to get a menu.

mobluse commented 8 years ago

The example below works better in WSL Terminal (i.e. WSLtty) than in Ubuntu from Store for WSL (using Cmd.exe). One can run it in two windows and compare. Some characters are missing in Cmd.exe.

sudo apt-get install toilet

ls /usr/share/figlet/ | sed 's/\..*//' | while read font; do toilet -F list | sed -n '/"/ {s/"\(.*\)".*/\1/;p}' | while read filter; do toilet -f $font -F $filter Orbin 2> /dev/null; done; done
BobFrankston commented 8 years ago

I too am seeing a problem with Unicode. I have a similar problem with Unicode with SSH but when I use Putty everything works fine.

There is one strange thing in this example. The string 方思腾.香港 rendered fine until I cursored over it and it didn't recover. This image shows the original version and the version after the cursor moved over it. Running emacs on in Putty on an Ubuntu system does not have the problem. I tried two different fonts and had th same problem (This also raises the question of why the command prompt doesn't to Unicode by default but that's a different topic.

untitled problem

zerocool4u2 commented 8 years ago

@BobFrGit that's because those characters aren't monospaced and cmd doesn't really support them(by the way, you can "set" a font to be type monospace and not have really any char on the same size, it's a font property), i have similar problems with glyphs, try to find a font with monospace ones for those unicode chars o you could try process that one with some python script, there is a project called powerline patched fonts(if i recall correctly) that have scripts that could help you

BobFrankston commented 8 years ago

The monospace assumption is interesting. I use Epsilon on Windows and when I go done it goes to the same nth character on a line. But using Emacs in Ubuntu when I do down it goes to the characters visually below. The question then is why does Emacs in Putty do it "right" or, at least, doesn't get confsued while using it in bash fails. If I use Emacs with SSH I get a different result -- substitution characters. (Same for DigitialOcean's own access tool)

In exploring Unicode I found that it can be far more complicated so I'm not trying to solve the general case -- just observing that PUTTY is an existence prove of a better approach.

KindDragon commented 8 years ago

Using ConEmu terminal can also help with this

BobFrankston commented 8 years ago

Thanks. For now epsilon and Putty work sufficiently well for me. Just wanted to flag the problem for now.

zerocool4u2 commented 8 years ago

@BobFrGit i mean the kanjis o whatever they are, you can see that they are double spaced, so when you move the cursor over you can see that they are splitted in half and you see the part that would match if they where 1 width each, like... 1 2 3 456 it would show 1 243 456 if you put the cursor next to the 2 and 1,2 and 3 where double spaced, because next to 2 is the fourth position on monospace types

BobFrankston commented 8 years ago

(Actually they are hanzi 汉字 but don't worry about it.) As I mentioned above both Emacs on Ubuntu via Putty and Epsilon on the PC don't have the problem though they take different approaches in dealing with the fact that those characters are not monospace.

This is not a big deal for me now -- just wanted to flag it. One feature is that I found that if I change the CMD font the dir listing will show 汉字 file names properly.

mrmckeb commented 7 years ago

I'm not sure if this is related, but it seems to be - I'm finding a lot of characters/symbols aren't rendering as expected in Bash on Windows 10. Although not necessary, a lot of build tools use special characters to show status of tests, etc.

I understand emoji is a whole different issue... so not raising that here.

BobFrankston commented 7 years ago

Yeah -- been playing with the 32 bit Unicode and that's a challenge in its own right. As an FYI Word seems to do a pretty good job on Emojis and I discovered that Alt-x let's me enter them. (At least some -- when I tried to enter ancient Chinese rod number I didn't find a font that had them).

whisust commented 7 years ago

Hey @mrmckeb same here, unable to use unicode emoji's / symbols... I had a personalized ps1 display with git, using top and down arrows. They are only squares now u_u

mrmckeb commented 7 years ago

@antlatrille Similar to what I've seen. Hopefully we can get more support for this in future releases!

Karasuni commented 7 years ago

Still encountering this issue using Bash on Windows over 1 year since the initial report. Is there any fix?

bitcrazed commented 7 years ago

Hey all. It's important to note that Console not being able to display a given symbol or set of symbols is a many-sided-blade! ;)

Alas, because the Console's text renderer is GDI-based, we're unable to support features like font-fallback which would allow us to support fonts that contain a specific set of symbols (e.g. Emoji, Klingon), but gradually fall-back on a more expansive font sets for other chars.

We have a goal to replace our renderer with a more modern DirectWrite renderer at some point in the (increasingly near) future.

When we do, we'll be able to do A LOT of very cool, modern, fancy things with text that we're simply unable to do right now.

Bear with us ;)

ronindesign commented 7 years ago

Thanks for the update on this.

fcharlie commented 7 years ago

@bitcrazed Use Direct2D rewrite Console ? Please add D2D1_DRAW_TEXT_OPTIONS_ENABLE_COLOR_FONT to enable color font, thanks !!!

BobFrankston commented 7 years ago

A side effect of revisiting this thread is that I realize i can use escape sequences in NodeJS console.log. I presume the new capabilities will be available via escape sequences so they can be used without needing to update libraries to take advantage of the new features.

dernyn commented 7 years ago

It's not just bash, it's a windows problem it seems.....just tried the same fonts with notepad or wordpad. It's the edit control, inherent to the GDI+, which has a problem with monospaced font rendering all over windows, 3rd party components not dependant on the edit control mechanism works fine.....firefox, chrome and mozilla rendering engines works perfect and so does the scintilla based editors, mintty comes from Putty and it works fine there too.

hwaldstein commented 6 years ago

It appears we've recently passed nine months since the last collaborator update, and this issue is still unresolved. Or, at least, I'm experiencing the same issues described above. Is there any news of progress on fixing this, or a more clear definition of what "(increasingly near) future" means? Any update would be greatly appreciated.

jacoby commented 6 years ago

I'm in agreement with @hwaldstein, but I have seen that unicode characters work using Hyper as the terminal for WSL instead of the default.

I'm not as happy with it's ANSI colors, but that's on Hyper, not WSL. Is there a better repo for this issue than WSL?

BobFrankston commented 6 years ago

There are rumors of new command processor and/or shell.. If so does it moot this and instead shift the focus to feature requests and betas?

bitcrazed commented 6 years ago

@fcharlie - you can count on that :)

bitcrazed commented 6 years ago

@BobFrGit Yes, our guidance (we'll be publishing some in the coming weeks) is to SetConsoleMode enabling ENABLE_VIRTUAL_TERMINAL_INPUT & use VT/ANSI escape sequences moving forward.

bitcrazed commented 6 years ago

@dernyn - as I pointed out above, GDI based display tech struggles with several mechanisms (esp. font-fallback) that are essential for displaying complex modern glyphs, including ninjacat emoji 🐱‍👤.

In the future, we plan on replacing the Console's current GDI based renderer with a renderer that uses DirectWrite (directly or indirectly) which will eliminate almost all our rendering, and many of our internationalization issues in one fell swoop!

bitcrazed commented 6 years ago

Hey @hwaldstein - thanks for your continued patience. While it may appear that we've been rather quiet over the last year or so, we've actually been cranking away, modernizing and overhauling much of the Console's internals, paving the way for us to start delivering user-visible improvements in future releases.

The 18H2 (2018, 2nd half) release that we're currently working on will deliver some pretty cool improvements, esp. for anyone building 3rd party terminals, and command-line shells, tools, and apps.

We have a long list of Console features queud up for subsequent OS releases too.

bitcrazed commented 6 years ago

@jacoby - thanks for your patience; I also refer you to my reply to @hwaldstein above.

Re. repo choices: We'll be moving many of these Console related issues over to the new Console issues repo in the coming months - feel free to post new issues over there from now onwards though..

bitcrazed commented 6 years ago

@BobFrGit - I am not aware of any new shell being created at Microsoft. We already have Cmd and PowerShell, and of course bash/zsh/fish/etc. in your favorite Linux distro(s) running atop WSL.

fcharlie commented 6 years ago

@bitcrazed I'm glad to see your decision, and I'm looking forward to the new console.

jacoby commented 6 years ago

@bitcrazed And of course the Cygwin-based Bash that is used in Git4Win, etc.

Can hardly wait for summer and the new Console. I like all about Hyper except the lag.

bitcrazed commented 6 years ago

@jacoby - yes, but Cygwin isn't a Microsoft shell.

And to be clear, we're not shipping a "new" Console this summer - it's the same Console, with significantly improved internals, and several bug fixes and improvements.

jacoby commented 6 years ago

Gotcha.

bardware commented 6 years ago

I have a setting in my .vimrc file in msys2 that displays every TAB as ➪ I don't see that character in WSL/Ubuntu (from store)

image

image

bitcrazed commented 6 years ago

@bardware VERY likely that code-point isn't included in your console's currently selected font. As mentioned above/elsewhere, Console renders using GDI which cannot perform font-fallback, so if your font doesn't contain the glyph for ➪ then we can only display the unprintable char glyph.

bardware commented 6 years ago

in your console's currently selected font

I played around a bit and tried some fonts alread, but I'll keep looking. thanks for your reply.

therealkenc commented 6 years ago

I played around a bit and tried some fonts alread, but I'll keep looking.

Quoth from the top:

rendering issues could be attributed to the font (Consolas), but some cannot.

The ➪ glyph is in the "kinda not" category. So, don't burn too much time downloading every fixed width font you can find on the Interwebs. It isn't going to help (or call me 😮 if it does). Like Rich alludes a bunch of posts back, getting from a given unicode sequence to a particular glyph is "a process". I'm sure all will be golden with the new engine. But in this instance, not likely with a different font; which one could reasonably misinterpret "currently selected" in the previous post as implying. Bonne chance.

stereokai commented 6 years ago

Can you please share with us - because you were rather vague 3 weeks ago - will "the same Console, with significantly improved internals, and several bug fixes and improvements" support UTF-8? Or will you only start working on it after "18H2 (2018, 2nd half)", meaning we should gather more patience? Thank you very much, tons of kudos for your work!

bitcrazed commented 6 years ago

All I can share right now is that we're working hard to make all the changes necessary to support UTF-8 which then enables us to work on adding rendering support for emoji, complex scripts, etc.

Not going to put dates on things until we're confident that a) things are working, b) we understand which releases our stuff lines up for.

It's a complex process, but bear with us - we're on it.

BobFrankston commented 6 years ago

My sympathy -- Unicode can get amazingly complex.

stereokai commented 6 years ago

Thanks a lot @bitcrazed

bitcrazed commented 6 years ago

@BobFrGit .. and people wonder why I've got so much more gray hair these days ;)

getting old

@stereokai Thanks 😀

bitcrazed commented 6 years ago

Hey all. Thanks for the discussion re. this issue. We're right in the middle of a ton of Console internals re-engineering that'll allow the Console to accurately support Unicode & UTF-8 text.

Closing this issue since:

  1. This work is underway
  2. This is the WSL issues repo, but this is an issue in Console which has its own Console GitHub Repo
  3. GitHub doesn't yet allow issues to be moved between repos, preserving posters' identity :(

If you have further asks/issues, please file new issues on our Console GitHub Repo.