`Cell` needs much deeper thought to truly support Unicode

tig commented 1 year ago

I hate the design I came up with for this for v2:

/// <summary>
/// Represents a single row/column in a Terminal.Gui rendering surface
/// (e.g. <see cref="LineCanvas"/> and <see cref="ConsoleDriver"/>).
/// </summary>
public class Cell {
    /// <summary>
    /// The list of Runes to draw in this cell. If the list is empty, the cell is blank. If the list contains
    /// more than one Rune, the cell is a combining sequence.
    /// (See #2616 - Support combining sequences that don't normalize)
    /// </summary>
    public List<Rune> Runes { get; set; } = new List<Rune> ();

    /// <summary>
    /// The attributes to use when drawing the Glyph.
    /// </summary>
    public Attribute? Attribute { get; set; }

    /// <summary>
    /// Gets or sets a value indicating whether this <see cref="T:Terminal.Gui.Cell"/> has
    /// been modified since the last time it was drawn.
    /// </summary>
    public bool IsDirty { get; set; }
}

It forces simple code to do cell.Runes[0] which makes code hard to read.

I propose changing this to:

/// <summary>
/// Represents a single row/column in a Terminal.Gui rendering surface
/// (e.g. <see cref="LineCanvas"/> and <see cref="ConsoleDriver"/>).
/// </summary>
public class Cell {
    /// <summary>
    /// The character to display. If <see cref="Rune"/> is <see langword="null"/>, then <see cref="Rune"/> is ignored.
    /// </summary>
    public Rune Rune { get; set; }

    /// <summary>
    /// The combining mark for <see cref="Rune"/> that when combined makes this Cell a combining sequence that could
    /// not be normalized to a single Rune.
    /// If <see cref="CombiningMark"/> is <see langword="null"/>, then <see cref="CombiningMark"/> is ignored.
    /// </summary>
    /// <remarks>
    /// Only valid in the rare case where <see cref="Rune"/> is a combining sequence that could not be normalized to a single Rune.
    /// </remarks>
    internal Rune CombiningMark { get; set; }

    /// <summary>
    /// The attributes to use when drawing the Glyph.
    /// </summary>
    public Attribute? Attribute { get; set; }

    /// <summary>
    /// Gets or sets a value indicating whether this <see cref="T:Terminal.Gui.Cell"/> has
    /// been modified since the last time it was drawn.
    /// </summary>
    public bool IsDirty { get; set; }
}

When so, all code that currently does cell.Runes[0] will become cell.Rune.

Only ConsoleDriver will need to use CombiningMark (see #2932)

See https://github.com/gui-cs/Terminal.Gui/discussions/2939

BDisp commented 1 year ago

It forces simple code to do cell.Runes[0] which makes code hard to read.

I agree.

There will never be more than 2 Runes in a Cell.

I've doubt about because I think that can have more than 2 runes. So I suggest to make this as follow in case of doubt. public List<Rune> CombiningMark { get; set; } = new List<Rune> ();

tig commented 1 year ago

It forces simple code to do cell.Runes[0] which makes code hard to read.

I agree.

There will never be more than 2 Runes in a Cell.

I've doubt about because I think that can have more than 2 runes. So I suggest to make this as follow in case of doubt. public List<Rune> CombiningMark { get; set; } = new List<Rune> ();

I've researched this. There are no known examples of unicode codepoints where there will be more than one combining mark.

Regardless, in the current PR CombingMark is commented out, so feel free to change this to the following in #2932

public List<Rune> CombiningMarks { get; set; } = new List<Rune> ();

BDisp commented 1 year ago

@tig thinking better this change will have complicate more than before because when you replace the previous column with the normalized rune list, now you have to deal with the new property CombiningMarks. In my opinion the previous code was better.

BDisp commented 1 year ago

Without say you broke all my idea about this. You could wait to merge first and do the change later.

tig commented 1 year ago

@tig thinking better this change will have complicate more than before because when you replace the previous column with the normalized rune list, now you have to deal with the new property CombiningMarks. In my opinion the previous code was better.

Yes, this will complicate ConsoleDriver and potentailly ConsoleDriver implementations, but it absolutely simplifies normal app developer's lives.

If you have an alternative solution, I'm all ears, but only if it does not make library user's code more complex.

a-usr commented 1 year ago

Why don't we (or you?) somewhat combine both of your proposals/ideas?

the result would look something like this:

/// <summary>
/// Represents a single row/column in a Terminal.Gui rendering surface
/// (e.g. <see cref="LineCanvas"/> and <see cref="ConsoleDriver"/>).
/// </summary>
public class Cell
  {
    /// <summary>
    /// The character to display. If <see cref="Rune"/> is <see langword="null"/>, then <see cref="Rune"/> is ignored.
    /// </summary>
    /// <remarks>
    /// Stores and reads from <see cref="Runes"/>[0]
    /// </remarks>
    public Rune Rune { get; set; } // { get => Runes[0]; set => {Runes[0] = value}; }

    /// <summary>
    /// The list where the Runes are stored. 
    /// </summary>
    /// <remarks>
    /// Could be made internal, but keeping this public prevents older apps from breaking as long as they operate within the lists size limit.
    /// </remarks>
    public List<Rune> Runes { get; set; } // = new List<Rune>(2){null, null}  // Limits List size to two Items. Can be mmade larger if required, but may help detect faulty code that else would result in improperly drawn cells

    /// <summary>
    /// The combining mark for <see cref="Rune"/> that when combined makes this Cell a combining sequence that could
    /// not be normalized to a single Rune.
    /// If <see cref="CombiningMark"/> is <see langword="null"/>, then <see cref="CombiningMark"/> is ignored. 
    /// </summary>
    /// <remarks>
    /// Only valid in the rare case where <see cref="Rune"/> is a combining sequence that could not be normalized to a single Rune. Stores and reads from <see cref="Runes"/>[1]
    /// </remarks>
    public Rune CombiningMark { get; set; } // { get => Runes[1]; set => {Runes[1] = value}; }

    /// <summary>
    /// The attributes to use when drawing the Glyph.
    /// </summary>
    public Attribute? Attribute { get; set; }

    /// <summary>
    /// Gets or sets a value indicating whether this <see cref="T:Terminal.Gui.Cell"/> has
    /// been modified since the last time it was drawn.
    /// </summary>
    public bool IsDirty { get; set; }
  }

}

Effectively, Rune and CombiningMark point to the objects positioned at the index 0 and 1 respectively inside of the list Runes. This both enables the use of the newer Properties Rune and CombiningMark, but also iterating through Runes for the ConsoleDrivers (wich i assume was @BDisp's idea wich he mentioned) and prevents old apps from breaking. Combine Runes with the Obsolete Attribute and both the compiler and intellisense issue a warning to not use this!

BDisp commented 1 year ago

@tig already did the PR #2934, which is well done. The obsolete Attribute is what only work for now with old terminal but I think they will exist for a long time, yet. The MS guys gave up to add more features for older terminals because increase the bugs and conflicts. The only leverage we can do with them is allowing render non-bmp code points when the font support them.

BDisp commented 1 year ago

👨+ \u200D +👩+ \u200D +👧 will result in 👨‍👩‍👧 (\U0001F468\u200D\U0001F469\u200D\U0001F467) \u200D isn't a combining mark but is a format Unicode category.

conhost:

WT force 16 colors:

WT true color:

BDisp commented 1 year ago

@tig I did these changes on my PR https://github.com/tig/Terminal.Gui/pull/16.

a-usr commented 1 year ago

Isn't Rune per definition Unicode, just like char is ascii?

BDisp commented 1 year ago

Isn't Rune per definition Unicode, just like char is ascii?

Right.

a-usr commented 1 year ago

Isn't Rune per definition Unicode, just like char is ascii?

Right.

But rune doesn't support combining unicode characters?

BDisp commented 1 year ago

But rune doesn't support combining unicode characters?

Yes, support. The reason for puts them on another property is because they should not occupy another column but the same as the Rune.

Edit: If the rune has a width of two column then the next column should be null '\0' to not be outputted.

a-usr commented 12 months ago

But rune doesn't support combining unicode characters?

Yes, support. The reason for puts them on another property is because they should not occupy another column but the same as the Rune.

Edit: If the rune has a width of two column then the next column should be null '\0' to not be outputted.

That makes sense. How exactly are you printing unicode? Are you parsing the unicode codepoint into an ansi escape sequence? Or how else do you do it?

BDisp commented 12 months ago

How exactly are you printing unicode? Are you parsing the unicode codepoint into an ansi escape sequence? Or how else do you do it?

On the v1 that is done on the override AddRune of each driver. Before the Contents only existed on the NetDriver. Then for unit tests was also implemented on the FakeDriver. With the Border implementation and for use the 3D effect, the Contents field was included on the ConsoleDriver abstract class and overridden on all drivers with duplicated code. On v2 @tig did a wonderfully job by remove the abstract keyword from the AddRune and the Contents for doing the common code in one place. The overridden UpdateScreen of each driver is responsible to collect all the information from the Contents field and translate to the respective buffer that each driver understand. By using the StringBuilder class some driver can recognize non-bmp as well, but the older WriteConsoleOutput only recognize char type and the cursor position cannot be handled. The only way to it support non-bmp is considering the non-bmp as a wide character with two columns. It will print them but if a non-bmp has only a width of one column it will add a space, because each buffer column contain a char (bmp). The CursesDriver is the only one than cant print non-bmp and I still don't understand why.

tig commented 12 months ago

The CursesDriver is the only one than cant print non-bmp and I still don't understand why.

This is the relevant code:

I started down the path of experimenting with the other Curses APIs for wide-chars to try to figure out what the heck Curses is doing, but decided to focus on Windows first. I'm still experimenting with how to do all this right on Windows.

I think a key to getting Curses working right is to use APIs other than mvaddch/mvaddstr like

https://linux.die.net/man/3/curs_add_wch

I really do not like the current architecture of the Curses driver. It was clearly hacked together by Miguel back in the day and has some really suspect code. I hope to significantly simplify and modernize it and I'd rather not spend any more energy trying to hack the current code to work. For example, this is just nuts:

Note TrueColor doesn't work yet either.

tig commented 9 months ago

This is super-relevant and super-cool:

A new Unicode Terminal Complex Script Support, or TCSS proposal

tig commented 9 months ago

And this:

https://github.com/contour-terminal/terminal-unicode-core

gui-cs / Terminal.Gui

`Cell` needs much deeper thought to truly support Unicode #2933