Table column width miscalculated when value contains "accent" Unicode character

taurit commented 1 month ago

Information

OS: Windows
Version: 0.49.1
Terminal: Windows Terminal

Describe the bug When I display strings containing Unicode accent characters in the table context:

the width of the table seems miscalculated
there seems to be a whitespace rendered after the characters

I tried troubleshooting (e.g. went through Best Practices -> Configuring the Windows Terminal For Unicode and Emoji Support) but with no difference. I am curious if this is a limitation of a Terminal, a bug, or if I am misusing the library and it can be fixed with a configuration change?

To Reproduce

static void Main(string[] args)
{
    var table = new Table();
    table.Border = TableBorder.Rounded;

    table.AddColumn("Field");
    table.AddColumn("Value");

    table.AddRow("Row 1", "1");
    table.AddRow("Row 2", "2 ąśłćż");
    table.AddRow("Row 3", "3 єшерти");
    table.AddRow("Row 4", "4 áb́ćd́"); // \u0301 is the issue here

    AnsiConsole.Write(table);
}

Expected behavior A table is rendered with the same column width in all rows.

Screenshots obraz

Additional context

Windows Terminal uses "Cascadia Code" font, as described in Best Practices
"Use Unicode UTF-8 for worldwide language support" checkbox is selected in my OS:

Thanks for your work and a great library! :)

Please upvote :+1: this issue if you are interested in it.

patriksvensson commented 1 month ago

This seems to be a bug in the wcwidth library. I will look into it.

elgonzo commented 1 month ago

there seems to be a whitespace rendered after the characters

The whitespace "holes" in the 4 á b́ ć d́ output look very much just like ye olde Windows Console behavior.

Your "4 áb́ćd́" string uses combining marks. A combining mark uses one character cell in the console output buffer independently from the character it is combining with, hence the empty space you see. It's not possible to solve this by shifting the cursor position one to the left after outputting a combining mark in an attempt to make the empty space available for output. Because then you are not going to see the combining marks (the diacriticals) anymore, because they are being overwritten in the console output buffer. (Have been there, done that...)

For many scenarios, string.Normalize() might be used to convert combining marks and their preceding character into single characters. However, this is not 100% bullet-proof and still might leave you with combining marks (and thus with "holes" in the console output), as there are no single (pre-composed) Unicode characters for all possible combinations of combining marks and their preceding character (as is the case with b́, for example).

taurit commented 1 month ago

@elgonzo Thank you for the explanation!

It indeed looks like an issue with specific terminals like cmd.exe rather than the library. I think there is nothing more to do on the side of Spectre.Console.

As an additional test, I pasted a simple echo "áb́ćd́éf́" to Windows Terminal and saw the same erroneous behavior with whitespace.

Workaround

I'll paste the workaround I ended up with in case someone with a similar problem finds this thread.

1) First, I use string.Normalize() to replace characters with ones showing wider compatibility where it's possible 2) Then, I remove accent characters remaining in the string. I lose some accent marks in the console output, but it's an acceptable tradeoff for me to keep the output readable.

Screenshot

Workaround ran in Windows Terminal

Code

// Variant 1
// (accent marks were the only characters problematic for Windows Terminal that I found)
Console.WriteLine("Strings without normalization:");
var table = new Table();
table.AddColumns("Field", "Value");
table.AddRow("Row 1", "ABCDEF");
table.AddRow("Row 2", "ąęúłśż");
table.AddRow("Row 3", "áéúíüñ");
table.AddRow("Row 4", "абвцде");
table.AddRow("Row 5", "а\u0301б\u0301в\u0301ц\u0301д\u0301е\u0301");
table.AddRow("Row 6", "a\u0301b\u0301c\u0301d\u0301e\u0301f\u0301");
table.AddRow("Row 7", "👍👎👌👏👋👊");
table.AddRow("Row 8", "你好嗎？我很");
table.AddRow("Row 9", "🇵🇱🇧🇷🇨🇦🇺🇸🇬🇧🇦🇺");
table.AddRow("Row 10", "أبجد ه");
AnsiConsole.Write(table);

// Variant 2
Console.WriteLine("Normalized with `string.Normalize(NormalizationForm.FormC)`:");
var table2 = new Table();
table2.AddColumns("Field", "Value");
table2.AddRow("Row 1", "ABCDEF".Normalize());
table2.AddRow("Row 5", "а\u0301б\u0301в\u0301ц\u0301д\u0301е\u0301".Normalize());
table2.AddRow("Row 6", "a\u0301b\u0301c\u0301d\u0301e\u0301f\u0301".Normalize());
AnsiConsole.Write(table2);

// Variant 3
Console.WriteLine("Normalized with `string.Normalize(NormalizationForm.FormC)`, remaining accent characters removed:");
string RemoveAccentMarks(string input) => string.Concat(input.Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));
var table3 = new Table();
table3.AddColumns("Field", "Value");
table3.AddRow("Row 1", "ABCDEF".Normalize());
table3.AddRow("Row 5", RemoveAccentMarks("а\u0301б\u0301в\u0301ц\u0301д\u0301е\u0301".Normalize()));
table3.AddRow("Row 6", RemoveAccentMarks("a\u0301b\u0301c\u0301d\u0301e\u0301f\u0301".Normalize()));
AnsiConsole.Write(table3);

Thanks again for the support!

spectreconsole / spectre.console