Open PerBothner opened 1 year ago
Looks good overall. Is it possible to split this into smaller chunks of work that can be reviewed and merged separately? The smaller a PR is the faster it is to understand/test/merge.
A _text string that contains all the characters in the string.
If we keep the text as raw text I have concerns that the JS engine may not be able to keep some optimizations compared to an Uint8Array
, also I'm not sure how long the engines normally hold on to unused immutatable strings. Maybe all is fine, but we'll want to measure this stuff.
This representation may also be more difficult to share memory between workers when we pursue https://github.com/xtermjs/xterm.js/issues/1515
The _data array is a combination of cell runs (represented as lengths in the _text string), and bg/fg/attribute values. Each element is a 4-bit "kind" (for example SKIP_COLUMNS for "null" columns), and 28 bits of data (such as a color value and the length of a text run).
Something I'm not clear on is how columns get mapped to indexes in _text
. For example currently for wide characters the cell's code is marked as null.
The prototype has an AbstractBufferLine class that implements IBufferLine and is the parent of BufferLine (and BufferOverflowLine).
It would be much easier to merge/iterate on this if we can have both implementations side-by-side where you can enable the new one via options.bufferVersion
or something.
Rendering does not require random access: Currently each renderer reads each line in sequential order, so using a CellData as an iterator is straightforward and efficient.
FYI we had some performance problems with iterators in the decoration code and ended up reverting to callbacks instead of iterators for hot code paths.
"Something I'm not clear on is how columns get mapped to indexes in
_text
Indexes in _text
are part of the CellData
state. You can set a CellData
from a column number using the scanNext
function or the scanMove
convenience function, though as scanNext
is relatively expensive you normally try to incrementaly update the CellData
state while moving sequentially forwards in a line. Moving forward one element in the _data
array increments (in the CellData
) the current position in the _text
string, depending on the _data
element kind and the remaining 28 bits of the value.
It would be much easier to merge/iterate on this if we can have both implementations side-by-side where you can enable the new one via
options.bufferVersion
or something.
If you don't care about performance during a transition phase, it is probably feasible to implement the new data structure while still using the old API. It should also be possible to use two different implementation classes - old and new. However, things that are currently O(1) will be much slower if using the new implementation and the old API, due to the need for many functions to allocate a local CellData
instance and call scanNext
to find the correct position in _data
and _text
.
One could mitigate this somewhat with conditional logic, at least in the most performance-critical places:
if (usingNewBuffer) { use new API; } else { use old API; }
Supporting both data structures and both APIs will of course have some cost (performance and complexity), if nothing else testing usingNewBuffer
. At some flag day you'd rip out the old data structure and API code.
@PerBothner we do care about performance during transition, if it's just a few if statements that shouldn't impact much. Ease of merging is a big part but probably the biggest is reducing risk when shipping this to vscode's users. If there are separate implementations we can toggle between, we can make the new one opt-in and when we think it's ready turn it opt-out in Insiders only (the beta build) to get a lot of low risk testing.
I have really big concerns regarding the _text
being a JS string and the perf implications. Its one thing that I tested very throroughly when doing the buffer rewrite in 2018 - strings are much much slower in almost every aspect due to their immutable nature imposing mem copies underneath. While this gets somewhat mitigated by Ropes, it still has to do the full string realisation at the first index access. Imho this def. needs perf testing and/or a typed array alternative.
(I really dont get why JS has no easy string <--> typed array interface, but due to the lack of that, we have to go with one in the end internally, and I think strings are not the better way here for the reason above)
Seems like the fundamental ideas can remain, just with _test transformed into a typed array managed by us where each character can take up multiple bytes.
I edited the proposal to add a new "Different visible and logical row width" section. This isn't directly part of the proposal, but it's an example of a useful improvement enabled by the proposal. I also fixed a few minor typos I noticed.
Seems like the fundamental ideas can remain, just with _test transformed into a typed array managed by us where each character can take up multiple bytes.
One possibility is to include the code units of the text directly in the _data
array. This would be like the current implementation, but extended to composite characters. Plus we'd skip the null content words.
I haven't really thought about how this would change code complexity or performance.
If we remove the _text
string field and include codepoints in the _data
array then the DataKind
bits could be:
const enum DataKind { // 4 bits
FG = 1, // lower 26 bits is RGB foreground color and CM_MASK
BG = 2, // lower 26 bits is RGB background color and CM_MASK
STYLE_FLAGS = 3, // lower 28 bits is StyleFlags
SKIP_COLUMNS = 7, // empty ("null") columns (28 bit count)
// The following have a 21-bit codepoint value in the low-order bits
CHAR_w1 = 8, // single-non-compound, 1 column wide
CHAR_w2 = 9, // single-non-compound, 2 columns wide
CLUSTER_START_w1 = 10, // start of non-trivial cluster, 1 column wide
CLUSTER_START_w2 = 11, // start of non-trivial cluster, 2 columns wide
CLUSTER_CONTINUATION = 12, // continuation of cluster
}
@PerBothner wouldn't including the text data in _data
complicate changing text and attributes in the middle much more complicated as it's much more likely to need to shift data around?
wouldn't including the text data in _data complicate changing text and attributes in the middle much more complicated as it's much more likely to need to shift data around?
We already (in my prototype) have to shift things around if text is changed in the middle of a line. The simple case where you replace some characters by some other characters (like updating an existing field in a form) can often be done in-place without shifting. The more complex cases (such as editing a Fish command line with syntax coloring) will require shifting, but it is not a performance-critical operation - and you're usually dealing with insertions or deletions anyway.
An idea that seems to make sense for at least the transition period and perhaps permanently:
Instead of extensively changing the API, we still use the "column number" as the key parameter to most of the functions, as opposed to using a CellData as a cursor/iterator. To avoid quadratic behavior (due to a linear search for every cell in a line) we keep a small (1-entry) cache in each BufferLine
that maps column number to the corresponding index in the _data
array. The cache would also need to save the corresponding fg/bg/attributes at (the start of) the corresponding position. This can be done with 2 or 3 53-bit integers.
When the most recent position is in the middle of a wide character or a SKIP_COLUMNS
section (see comment above), the cached column number is that corresponding to the start of the wide character or the SKIP_COLUMNS
section.
I've started re-vamping the BufferLine re-implementation based on these comments. I just checked in the first batch of changes.
_data
array. I got rid of the _text
string. (This change is incomplete.) Hopefully this will assuage @jerch's concerns.loadCell
has been re-written to make use of the cache. This should allow using the old API with at least reasonable performance. loadCell
. (webgl and canvas renderer still to do.)Update and status of my fork:
Both the old BufferLine
implementation (renamed to OldBufferLine
) and the new the implementation (renamed to NewBufferLine
) are available in the source. Instead of new BufferLine
use BufferLine.make
which creates one or the other based on the value of USE_NewBufferLine
. The latter should be tied to a terminal option, but isn't yet.
I ripped out the code for using "cursors". Instead I reimplemented the old API using a cache in each LineBuffer
. This may not ultimately be the best way to do things (a per-Buffer
cache might be cheaper), but it keeps things more compatible for now.
For performance, I added an insertText
method. This mainly reduces some of the per-codepoint overhead. The downside is insertText
in BufferLine
needs to look up the unicode properties, so there is less clean separation. I suspect the performance gain justifies the slightly uglier design, but I don't have any numbers. (Easy enough to test - just change the print
method in InputHandler
to always call printOld
.)
The biggest remaining piece is how to deal with line-wrapping. I would prefer to do that by storing "logical lines" (without wrapping) using a _data
array that is shared between wrapped rows, though that has some complications. It might be plausible to save this for a follow-up change.
Some simple tests (output with wide characters, clusters, fg/bg changes) and applications (such as mc
and emacs
) work tolerable well. However, there are still lot of corner cases and less common escape sequences (in addition to line-wrapping) to deal with.
@PerBothner Sorry for being absent for a couple of weeks. Gonna try to catch up by tomorrow hopefully.
It might make sense to update the the proposal to match the current implementation. On the other hand, it might be helpful to get some feedback that the current approach is reasonable before I spend too much time re-writing the proposal. Have you (@Tyriar or @jerch) been able to take a recent look at the implementation?
I've been doing a lot of thinking about how to handle line-wrapping, without a firm conclusion. I'm pretty sure the goal should be that lines that are isWrapped
should share the _data
array and the _extendedAttrs
table with the main (non-wrapped line). Basically, the wrapped lines ("rows") should indirect to the main "logical" lines. I think that simplifies a number of things, including some new features I won't go into here. However, that requires some non-trivial non-local changes: You can't just set or clear isWrapped
. As a start, I'd like to change:
buffer.lines.get(i).isWrapped = should_wrap;
to
buffer.setWrapped(i, should_wrap)
This would improve flexibility.
@PerBothner Yes I started today looking into your branch by comparing to xtermjs/master, but its kinda a lot to digest, so it will take a bit longer. Also the diff view is quite long due to ongoing changes on master not yet reflected in your branch. Would it be possible for you to merge in those non-related changes from master? Or alternatively, could you give me a commit hash of the main development, which you consider to be the main branch point regarding the buffer rewrite?
Edit:
Regarding the setWrapped
idea - thats fine by me. And as long as it does not end up in public API, it doesnt even need a major release.
I did a merge from upstream/master.
git diff master
appears to show something sane ....
I did just notice there is still some old crud in CellData: _stateA
and related fields that are no longer used. I'll clean that up.
I committed a cleanup of Celldata.ts
.
This (or the logical equivalent) should work:
git diff b01f82717fd539c2fa49884e242fa3b2e4927cc2 810cc1d47e0a43996c04bb52929b1dcaad229108
Thx, will use that one. :+1:
I've started work on the new line-wrapping approach. There are new two concrete classes:
LogicalBufferLine
is contains the actual _data
array and is also used for the initial (non-wrapped) line.WrappedBufferLine
presentes a wrapped (part of a) line (duh) and indirects to the preceding LogicalBufferLine
.Buggy and incomplete but should indicate what I'm aiming for.
If you have a check-out of my fork, you can do:
git diff master buffer-cell-cursor
Things are coming together, and it may soon be time to create an actual Pull Request (even though there is still quite a bit more to be done).
Right now I'm trying to clean up yarn lint
warnings, and I'm puzzled by how TypeScript handles protected
. Consider this simplified example:
export abstract class BufferLine {
protected abstract _cachedBg(): number;
};
export class LogicalBufferLine extends BufferLine {
protected _cache2: number = 0;
protected override _cachedBg(): number { return this._cache2; }
}
export class WrappedBufferLine extends BufferLine {
_logicalLine: LogicalBufferLine;
protected override _cachedBg(): number { return this._logicalLine._cachedBg(); }
}
However, tsc
complains:
Lines.ts:12:69 - error TS2445: Property '_cachedBg' is protected and only accessible within class 'LogicalBufferLine' and its subclasses.
This seems wrong to me (as an old Java programmer): '_cachedBg' should be accessible in BufferLine
and it's subclasses. Explicitly casting this._logicalLine
to a BufferLine
doesn't help:
Lines.ts:12:87 - error TS2446: Property '_cachedBg' is protected and only accessible through an instance of class 'WrappedBufferLine'. This is an instance of class 'BufferLine'.
I'm reluctant to whine "compiler bug" but this seems overly strict. Is this really how TypeScript is supposed to work?
I can work around the problem by making _cachedBg
public but that seems unfortunate.
See this branch for a prototype of this proposal. The prototype is not usable: a lot of things work; a lot don't.
The BufferLine data structure contains a
_data
field, which is aUint32Array
with 3 elements per column. This makes for fast O(1) mapping from column number to character/cell information, but it has anumber of disadvantages:It is not memory-effecient.
Convoluted encoding due to squeezing attribute values into available bits in the foreground/background elements.
Tied to a terminal model with an integral number of fixed-width cells. Supports double-width characters and grapheme clusters (somewhat clumsily), but no variable-width fonts, or any glyphs whose width isn't an integral number of cells. Many languages don't work well with fixed-width characters. For other languages being forced to use fixed-width character is unaestheic or unfriendly. (This includes to some extent English.) (Support for variable-width fonts is not part of the current proposal, but I do have some ideas for what can be done.)
Limited extensibility: There aren't a lot of available bits left, and there is no room for properties that require more than a few bits. Anything that doesn't fit has to be added the the extended-attribute object, which is more expensive.
Clumsy/expensive reflow when window width changes.
Proposal
Summary
The main fields of a
BufferLine
become:A
_text
string that contains all the characters in the line. This is the concatenation of all the content code (for simple characters), and the_combined
elements, and subsumes both.The
_data
array is a combination of cell runs (represented as lengths in the_text
string), and bg/fg/attribute values. Each element is a 4-bit "kind" (for exampleSKIP_COLUMNS
for "null" columns), and 28 bits of data (such as a color value or the length of a text run). The_dataLength
field contains the "current" (active) length of the_data
array, to allow for future insertions. The_data
array only needs as many elements as actually used, though pre-allocating extra space for "growth" is typically worthwhile.In the prototype, a "run" is either one or more non-composed BMP characters of the same width; a run of "null" columns; or a single "other" glyph (a cluster or a non-BMP character). If the number columns spanned by the
_data
elements before_dataLength
, an implicitSKIP_COLUMNS
represents the rest of the line.A
CellData
contains the current index into the_data
and_text
arrays. Since a single_data
element can represent a run of multiple columns (when all characters are BMP and the same width) the CellData also maintains a column-offset relative to the start of the current_data
element.The
CellData
also contains the bg/fg/attribute state at the current position.Space efficiency
The
_data
array contains elements for changes in bh/fg/attribute value, and for "runs" of text. This is much more efficient than the current_data
array.Each non-null character is just one or 2 16-bit code units in the
_text
string, which is as efficient as you can get.InputHandler efficiency
In the prototype, most editing operations make use of a
CellData
object that represents the current position in theBufferLine
to edit. Given an appropriate CellData, the actual editing operation is comparable in complexity to the existing implementation: The logic is sometimes slightly more complicated, but the amount of work is comparable: either fixed, or proportional to the number of following runs (if elements have to be inserted or deleted). Bulk operations (which are most performance-critical) tend to be at the end of a line, so usually you're modifying the last element or two in the_data
array or appending to the end of it.However, setting the fields of a
CellData
to the correct values representing an arbitrary column position is no longer O(1), but is proportional to the number of "runs" between the start of the line and the desired position. This is an obvious potential performance regression.Luckily, most output is sequential, not random-access. This is especially the case non-interactive "bulk" output.
The
InputHandler
maintains a_workCell
that when valid represents the current state of the current position. In the current implementation, the_workCell
state is initialized at the start of eachprint
call. A natural optimization is to assume if the_workCell
is "valid" it can be used without initialization. Escape sequences that move the cursor will usually need to invalidate the_workCell
, but plain text and attribute changes do not.Another possible performance concern is updating the
_text
string. In the prototype is this done separately for each character, but it can obviously be "chunked" to larger units.Rendering efficiency
Rendering does not require random access: Currently each renderer reads each line in sequential order, so using a CellData as an iterator is straightforward and efficient.
Efficiency of selection, serialization and other operations
Other operations generally work with with sequential access, or they are not performance-critical.
Possible refinements
The bg/fg/attribute could be the xor'd values of the corresponding previous values. This would make backwards traversal efficient.
While using a
_text
array to contain both simple characters and clusters is very compact, there is some costs in terms of mapping codeunits to strings, and appending the new strings to the_text
string. (On the other hand, operations where string values are needed may be faster than currently, especially where runs of characters are desired (as in the dom renderer) because extracting a substring from the_text
string is probably relatively inexpensive.) An alternative is to store the codeunits in the_data
array, as in the current implementation.Line overflow and reflow
Terminology: A (visible) "row" is the text/data on a single row in the terminal. A (logical) "line" is one or more rows which "wrap" into each other.
A tempting follow-up change is for all the rows belonging to the same line to share the
_data
array and the_text
array. This means neither_data
or_text
change when text is re-flowed. We just add or remove row objects.A possible data structure is to use a plain
BufferLine
for a line (including the initial rows), and a sub-classBufferOverflowLine
for each wrapped (non-initial) row. EachBufferOverflowLine
points back to the parentBufferLine
. TheBufferOverflowLine
also contains the state needed to efficiently initialize aCellData
at the start of that row.A reflow operations needs to add or remove
BufferOverflowLine
children of a parentBufferLine
.A bonus is cleaner separation between the content "model" (a table of lines with
_data
and_text
properties) versus the "view" in a specific viewport (a table of rows).Different visible and logical row width
Consider a REPL: While typing an input line, you change the line width, either by resizing the viewport or changing the zoom. The terminal send the new line width to the application, which sends a sequence to re-draw the input line with the new width. But by the time the terminal can update itself, there may be a further re-sizing. This can result in a garbled display.
A solution: The terminal does not send the window-resize request to the application. Instead it does a local reflow, just as it would do with old output lines. The tricky part is that the terminal must interpret escape sequences from the application using the old width (since that is what the application believes to be the current width).
The implementation isn't terribly difficult: create a
BufferOverflowLine
set based on the actual (visible) line width, and a different set based on the logical (application) line width. Basically you add a flag "use this BufferOverFlowLine during rendering but ignore it during input-handling" and a converse flag "use this during input-handling but ignore it during rendering".This behavior can be controlled by shell-integration escape sequences: A prompt-end escape sequence turns off window-size change reporting and enables tracking separate visible and logical line width. An input-end (command-start) sequence returns to normal.
The same logic can be used to support variable-width fonts in prompts and command input: The application assumes normal fixed-width characters, and works with the logical width, while the renderer displays a user-preferred variable-width font, wrapping lines based on the actual width. This behavior should be opt-in: It can be enabled by a flag in the shell-integrate escape. This can be set in a user configuration file, but does not require modifying the actual application.
AbstractBufferLine
The prototype has an
AbstractBufferLine
class that implementsIBufferLine
and is the parent ofBufferLine
(andBufferOverflowLine
). The intention is that we might add new sub-classes for "sections" that aren't normal lines. For example, images or raw (but safety-scrubbed) HTML<div>
elements. We may also support lines that have different heights.Esoteric line types are not part of part of this proposal, though they motivate some design decisions.
Rich output: HTML, SVG, and images
As an example the
gnuplot
graphing program has an interactive mode where the output of each command can display a graph in various formats, including SVG. If the gnuplot terminal type is "domterm", it generates SVG, wraps it an some simple escape sequences, and DomTerm displays below the input command.Another example is a math program emitting MathML which is displayed in the same REPL console as terminal output.
Note this is similar to ipython/Jupyter, but directly in the terminal. This you can mix rich output with full terminal support.
CellData is concrete
Conceptually, the CellData implementation depends on the AbstractBufferLine implementation. However, we would like to pre-allocate CellData helper objects, and re-use the CellData instance for many lines, regardless of whether the line is a regular BufferLine or something else. Hence CellData has various fields with unspecified meaning:
_stateM
,_stateN
,_stateA
,_stateB
. These are available for any AbstractBufferLine implementation to use as it sees fit. For example_stateM
and_stateN
are the current indexes in the_data
and_text
arrays,