Closed Tyriar closed 5 years ago
Another approach that could be mixed in: use indexeddb, websql or filesystem api to page out inactive scrollback entries to disk π€
Great proposal. I agree with that 3. is the best way to go for now as it lets us save memory while supporting true color as well.
If we reach there and things continue to go well, we can then optimize as proposed in 5. or in any other way that comes in our minds at that time and makes sense.
3. is great π.
@mofux, while there is definitely a use case for using disk-storage-backed techniques to reduce the memory footprint, this can degrade the user experience of the library in browser environments that ask the user for permission for using disk storage.
Regarding Supporting the future:
The more I think about it, the more the idea of having a WebWorker
that does all the heavy work of parsing the tty data, maintaining the line buffers, matching links, matching search tokens and such appeals to me. Basically doing the heavy work in a separate background thread without blocking the ui. But I think this should be part of a separate discussion maybe towards a 4.0 release π
+100 about WebWorker in the future, but I think we need change list versions of browsers which we are supporting because not all off them can use it...
When I say Int32Array
, this will be a regular array if it is not supported by the environment.
@mofux good thinking with WebWorker
in the future π
@AndrienkoAleksandr yeah if we wanted to use WebWorker
we would need to still support the alternative as well via feature detection.
Wow nice list :)
I also tend to lean towards 3. since it promises a big cut in memory consumption for over 90% of the typical terminal usage. Imho memory optimisation should be the main goal at this stage. Further optimization for specific use cases might be applicable on top of this (what comes to my mind: "canvas like apps" like ncurses and such will use tons of single cell updates and kinda degrade the [start, end]
list over time).
@AndrienkoAleksandr yeah I like the webworker idea too since it could lift some burden from the mainthread. Problem here is (beside the fact that it might not be supported by all wanted target systems) is the some - the JS part is not such a big deal anymore with all the optimizations, xterm.js has seen over the time. Real issue performance-wise is the layouting/rendering of the browser...
@mofux The paging thing to some "foreign memory" is a good idea, though it should be part of some higher abstraction and not of the "gimme an interactive terminal widget thing" that xterm.js is. This could be achieved by an addon imho.
Offtopic: Did some tests with arrays vs. typedarrays vs. asm.js. All I can say - OMG, it is like 1 : 1,5 : 10
for simple variable loads and sets (on FF even more). If pure JS speed really starts to hurt, "use asm" might be there for the rescue. But I would see this as a last resort since it would imply fundamental changes. And webassembly is not yet ready to ship.
Offtopic: Did some tests with arrays vs. typedarrays vs. asm.js. All I can say - OMG, it is like 1 : 1,5 : 10 for simple variable loads and sets (on FF even more)
@jerch to clarify, is that arrays vs typedarrays is 1:1 to 1:5?
Woops nice catch with the comma - i meant 10:15:100
speed wise. But only on FF typed arrays were slightly faster than normal arrays. asm is at least 10 times faster than js arrays on all browsers - tested with FF, webkit (Safari), blink/V8 (Chrome, Opera).
@jerch cool, a 50% speed up from typedarrays in addition to better memory would definitely be worth investing in for now.
Idea for memory saving - maybe we could get rid of the width
for every character. Gonna try to implement a less expensive wcwidth version.
@jerch we need to access it quite a bit, and we can't lazy load it or anything because when reflow comes we will need the width of every character in the buffer. Even if it was fast we might still want to keep it around.
Might be better to make it optional, assuming 1 if it's not specified:
type CharData = [string, number?]; // not sure if this is valid syntax
[
// 'a'
['a'],
// 'ζ'
['ζ', 2],
// after wide
['', 0],
...
]
@Tyriar Yeah - well since Ive already written it, please have a look at it in PR #798 Speedup is 10 to 15 times on my computer for the cost of 16k bytes for the lookup table. Maybe a combination of both is possible if still needed.
Some more flags we'll support in the future: https://github.com/sourcelair/xterm.js/issues/580
Another thought: Only the bottom portion of the terminal (Terminal.ybase
to Terminal.ybase + Terminal.rows
) is dynamic. The scrollback which makes up the bulk of the data is completely static, perhaps we can leverage this. I didn't know this until recently, but even things like delete lines (DL, CSI Ps M) don't bring the scrollback back down but rather insert another line. Similarly, scroll up (SU, CSI Ps S) deletes the item at Terminal.scrollTop
and inserts an item at Terminal.scrollBottom
.
Managing the bottom dynamic portion of the terminal independently and pushing to scrollback when the line is pushed out could lead to some significant gains. For example, the bottom portion could be more verbose to favor modifying of attributes, faster access, etc. whereas the scrollback can be more of an archival format as proposed in the above.
Another thought: it's probably a better idea to restrict CharAttributeEntry
to rows as that's how most applications seem to work. Also if the terminal is resized then "blank" padding is added to the right which doesn't share the same styles.
eg:
To the right of the red/green diffs are unstyled "blank" cells.
@Tyriar Any chance to get this issue back on the agenda? At least for output intensive programs a different way of holding the terminal data might save a lot of memory and time. Some hybrid of 2/3/4 will give a huge throughput boost, if we could avoid splitting and saving single chars of the input string. Also saving the attributes only once they changed will help to save memory.
Example: With the new parser we could save a bunch of input characters without messing around with the attributes, since we know that they will not change in the middle of that string. The attributes of that string could be saved in some other data structure or attribute along with wcwidths (yeah we still need those to find they line break) and line breaks and stops. This would basically give up the cell model when data is incoming. Problem arises if something steps in and wants to have a drill-down representation of the terminal data (e.g. the renderer or some escape sequence/user wants to move the cursor). We still have to do the cell calculations if that happens, but it should be sufficient to do this only for the content within the terminal cols and rows. (Not sure yet about the scrolled out content, that might be even further cachable and cheap to redraw.)
@jerch I'll be meeting @mofux one day in Prague in a couple of weeks and we were going to do/start some internal improvements of how text attributes are handled which covers this π
From https://github.com/xtermjs/xterm.js/pull/1460#issuecomment-390500944
The algo is kinda expensive since every char needs to be evaluate twice
@jerch if you have any ideas on faster access of text from the buffer, let us know. Currently most of it is just a single character as you know, but it could be an ArrayBuffer
, a string, etc. I've been thinking we should think about taking more advantage of scrollback being immutable somehow.
Well I have experimented alot with ArrayBuffers in the past:
Array
regarding runtime for the typical methods (maybe still less optimized by engine vendors)new UintXXArray
is far worse than literal array creation with []
TextEncoder
though)My findings about ArrayBuffer
suggest not to use them for string data due to the conversion penalty. In theory the terminal could use ArrayBuffer
from node-pty up to the terminal data (this would save several conversions on the way to the frontend), not sure if the rendering can be done that way, I think to render stuff it always needs a final uint16_t
to string
conversion. But even that one last string creation will eat most of the runtime saved - and furthermore, would turn the terminal internally into an ugly C-ish beast. Therefore I gave up this approach.
TL;DR ArrayBuffer
is superior if you can prealloc and reuse the data structure. For everything else normal arrays are better. Strings are not worth to be squeezed into ArrayBuffers.
A new idea I came up with tries to lower string creation as much as possible, esp. tries to avoid the nasty splits and joins. It is kinda based on your 2nd idea above with the new InputHandler.print
method, wcwidth and line stops in mind:
print
now gets whole strings up to several terminal lineswcwidth(string) % cols
\n
(hard line break): advance cursor by one line, mark position in pointer list as hard break\r
: load last line content (from current cursor position to last line break) into some line buffer to get overwritten\r
case no cell abstraction nor string splitting is neededcols
x rows
representation (they just change the attr flag that gets saved along with the whole string)Btw the wcwidths are a subset of the grapheme algo, so this might be interchangeable in the future.
Now the hazardous part 1 - someone wants to move the cursor within the cols
x rows
:
cols
backwards in line breaks - the beginning of the current terminal contentNow the hazardous part 2 - the renderer wants to draw something:
Pros:
InputHandler
method - print
Cons:
InputHandler
method will be hazardous in the sense of interrupting this flow model and the need of some intermediate cell abstractionWell this is a rough draft of the idea, far from being useable atm since many details are not covered yet. Esp. the "hazardous" parts could get nasty with many performance problems (such as degrading the buffer with even worse gc behavior etc pp)
@jerch
Strings are not worth to be squeezed into ArrayBuffers.
I think Monaco stores its buffer in ArrayBuffer
s and is pretty high performance. I haven't looked too deeply into the implementation yet.
esp. tries to avoid the nasty splits and joins
Which ones?
I've been thinking we should think about taking more advantage of scrollback being immutable somehow.
One idea was to separate scrollback from the viewport section. Once a line goes to scrollback it gets pushed into the scrollback data structure. You could imagine 2 CircularList
objects, one whose lines are optimized for never changing, one for the opposite.
@Tyriar About the scrollback - yes since this is never reachable by the cursor it might save some memory to just drop the cell abstraction for scrolled out lines.
@Tyriar It makes sense to store strings in ArrayBuffer if we can limit the conversion down to one (maybe the final one for render output). That is slightly better than string handling all over the place. This would be doable since node-pty can give raw data as well (and also the websocket can give us raw data).
esp. tries to avoid the nasty splits and joins
Which ones?
The whole approach is to avoid minimize splits at all. If noone requests cursor jumps into the buffered data, strings would never be split and could go right to the renderer (if supported). No cell splits and later joins at all.
@jerch well it can if the viewport is expanded, I think we may also pull in the scrollback when a line is deleted? Not 100% sure on that or even if it's correct behavior.
@Tyriar Ah right. Not sure about the latter either, I think native xterm allows this only for real mouse or scrollbar scrolling. Even SD/SU does not move scrollbuffer content back into the "active" terminal viewport.
Could you point me to the source of the monaco editor, where the ArrayBuffer is used? Seems I cant find it myself :blush:
Hmm just reread the TextEncoder/Decoder spec, with ArrayBuffers from node-pty up to the frontend we are basically stuck with utf-8, unless we translate it the hard way at some point. Making xterm.js utf-8 aware? Idk, this would involve many intermediate codepoint calculations for the higher unicode chars. Proside - it would save memory for ascii chars.
@rebornix could you give us some pointers to where monaco stores the buffer?
Here are some numbers for typed arrays and the new parser (was easier to adopt):
print
action jumps from 190 MB/s to 290 MB/sprint
action jumps from 190 MB/s to 320 MB/sOverall UTF-16 performs much better, but that was expected since the parser is optimized for that. UTF-8 suffers from the intermediate codepoint calculation.
The string to typed array conversion eats ~4% JS runtime of my benchmark ls -lR /usr/lib
(always far below 100 ms, done via a loop in InputHandler.parse
). I did not test the reverse conversion (this is implicitly done atm in InputHandller.print
on the cell by cell level). The overall runtime is slightly worse than with strings (the saved time in the parser does not compensate the conversion time). This might change when other parts are also typed array aware.
And the corresponding screenshots (tested with ls -lR /usr/lib
):
with strings:
with Uint16Array:
Note difference for EscapeSequenceParser.parse
, which can profit from a typed array (~30% faster). The InputHandler.parse
does the conversion, thus it is worse for the typed array version. Also GC Minor has more to do for typed array (since I throw the array away).
Edit: Another aspect can be seen in the screenshots - the GC becomes relevant with ~20% runtime, the long running frames (red flagged) are all GC related.
Just another somewhat radical idea:
int8
to int16
to int32
types are possible. The allocator returns a free index on the Uint8Array
, this pointer can be converted to a Uint16Array
or Uint32Array
position by a simple bit shift.uint16_t
type for UTF-16.InputHandler
with pointers to this memory instead of string slices.struct Cell {
uint32_t *char_start; // start pointer of cell content (JS with pointers hurray!)
uint8_t length; // length of content (8 bit here is sufficient)
uint32_t attr; // text attributes (might grow to hold true color someday)
uint8_t width; // wcwidth (maybe merge with other member, always < 4)
..... // some other cell based stuff
}
Pros:
malloc
and free
costs (depends on cleverness of the allocator/deallocator)Cons:
:smile:
hardfun to implement, changes kinda everything π
This is closer to how Monaco works, I remembered this blog post which discusses the strategy for storing character metadata https://code.visualstudio.com/blogs/2017/02/08/syntax-highlighting-optimizations
Yup thats basically the same idea.
Hope my answer to where monaco stores the buffer is not too late.
Alex and I are in favor of Array Buffer and most of the time it gives us good performance. Some places we use ArrayBuffer:
We use simple strings for text buffer instead of Array Buffer as V8 string is easier to manipulate
Pinned @rebornix's links so they don't break when commits are added π
The following list is just a quick summary of interesting concepts I stumbled over that might help to lower memory usage and/or runtime:
To get sumthing rolling here, here is a quick hack how we could "merge" text attributes.
The code is mainly driven by the idea to save memory for the buffer data (runtime will suffer, not tested yet how much). Esp. the text attributes with RGB for foreground and background (once supported) will make xterm.js eat tons of memory by the current cell by cell layout. The code tries to circumvent this by the usage of a resizable ref-counting atlas for attributes. This is imho an option since a single terminal will hardly hold more than 1M cells, which would grow the atlas to 1M * entry_size
if all cells differ.
The cell itself just needs to hold the index of the attribute atlas. On cell changes the old index needs to be unref'd and the new one ref'd. The atlas index would replace the attribute attribute of the terminal object and will be changed itself in SGR.
The atlas currently only addresses text attributes, but could be extended to all cell attributes if needed. While the current terminal buffer holds 2 32bit numbers for attribute data (4 with RGB in the current buffer design) the atlas would reduce it to only one 32bit number. The atlas entries can be packed further too.
interface TextAttributes {
flags: number;
foreground: number;
background: number;
}
const enum AtlasEntry {
FLAGS = 1,
FOREGROUND = 2,
BACKGROUND = 3
}
class TextAttributeAtlas {
/** data storage */
private data: Uint32Array;
/** flag lookup tree, not happy with that yet */
private flagTree: any = {};
/** holds freed slots */
private freedSlots: number[] = [];
/** tracks biggest idx to shortcut new slot assignment */
private biggestIdx: number = 0;
constructor(size: number) {
this.data = new Uint32Array(size * 4);
}
private setData(idx: number, attributes: TextAttributes): void {
this.data[idx] = 0;
this.data[idx + AtlasEntry.FLAGS] = attributes.flags;
this.data[idx + AtlasEntry.FOREGROUND] = attributes.foreground;
this.data[idx + AtlasEntry.BACKGROUND] = attributes.background;
if (!this.flagTree[attributes.flags])
this.flagTree[attributes.flags] = [];
if (this.flagTree[attributes.flags].indexOf(idx) === -1)
this.flagTree[attributes.flags].push(idx);
}
/**
* convenient method to inspect attributes at slot `idx`.
* For better performance atlas idx and AtlasEntry
* should be used directly to avoid number conversions.
* @param {number} idx
* @return {TextAttributes}
*/
getAttributes(idx: number): TextAttributes {
return {
flags: this.data[idx + AtlasEntry.FLAGS],
foreground: this.data[idx + AtlasEntry.FOREGROUND],
background: this.data[idx + AtlasEntry.BACKGROUND]
};
}
/**
* Returns a slot index in the atlas for the given text attributes.
* To be called upon attributes changes, e.g. by SGR.
* NOTE: The ref counter is set to 0 for a new slot index, thus
* values will get overwritten if not referenced in between.
* @param {TextAttributes} attributes
* @return {number}
*/
getSlot(attributes: TextAttributes): number {
// find matching attributes slot
const sameFlag = this.flagTree[attributes.flags];
if (sameFlag) {
for (let i = 0; i < sameFlag.length; ++i) {
let idx = sameFlag[i];
if (this.data[idx + AtlasEntry.FOREGROUND] === attributes.foreground
&& this.data[idx + AtlasEntry.BACKGROUND] === attributes.background) {
return idx;
}
}
}
// try to insert into a previously freed slot
const freed = this.freedSlots.pop();
if (freed) {
this.setData(freed, attributes);
return freed;
}
// else assign new slot
for (let i = this.biggestIdx; i < this.data.length; i += 4) {
if (!this.data[i]) {
this.setData(i, attributes);
if (i > this.biggestIdx)
this.biggestIdx = i;
return i;
}
}
// could not find a valid slot --> resize storage
const data = new Uint32Array(this.data.length * 2);
for (let i = 0; i < this.data.length; ++i)
data[i] = this.data[i];
const idx = this.data.length;
this.data = data;
this.setData(idx, attributes);
return idx;
}
/**
* Increment ref counter.
* To be called for every terminal cell, that holds `idx` as text attributes.
* @param {number} idx
*/
ref(idx: number): void {
this.data[idx]++;
}
/**
* Decrement ref counter. Once dropped to 0 the slot will be reused.
* To be called for every cell that gets removed or reused with another value.
* @param {number} idx
*/
unref(idx: number): void {
this.data[idx]--;
if (!this.data[idx]) {
let treePart = this.flagTree[this.data[idx + AtlasEntry.FLAGS]];
treePart.splice(treePart.indexOf(this.data[idx]), 1);
}
}
}
let atlas = new TextAttributeAtlas(2);
let a1 = atlas.getSlot({flags: 12, foreground: 13, background: 14});
atlas.ref(a1);
// atlas.unref(a1);
let a2 = atlas.getSlot({flags: 12, foreground: 13, background: 15});
atlas.ref(a2);
let a3 = atlas.getSlot({flags: 13, foreground: 13, background: 16});
atlas.ref(a3);
let a4 = atlas.getSlot({flags: 13, foreground: 13, background: 16});
console.log(atlas);
console.log(a1, a2, a3, a4);
console.log('a1', atlas.getAttributes(a1));
console.log('a2', atlas.getAttributes(a2));
console.log('a3', atlas.getAttributes(a3));
console.log('a4', atlas.getAttributes(a4));
Edit:
The runtime penalty is almost zero, for my benchmark with ls -lR /usr/lib
it adds less than 1 msec to the total runtime of ~2.3 s. Interesting side note - the command sets less than 64 different text attribute slots for the output of 5 MB of data and will save more than 20 MB once fully implemented.
Made some prototype PRs to test some changes to the buffer (see https://github.com/xtermjs/xterm.js/pull/1528#issue-196949371 for the general idea behind the changes):
@jerch it might be a good idea to stay away from the word atlas for this so that "atlas" always means "texture atlas". Something like store or cache would probably be better?
oh ok, "cache" is fine.
Guess I am done with the testbed PRs. Please also have a look at the PR comments to get the background of the following rough summary.
Proposal:
AttributeCache
to hold everything needed to style a single terminal cell. See #1528 for an early ref counting version that can also hold true color specs. The cache can also be shared between different terminal instances if needed to saved further memory in multiple terminal apps.StringStorage
to hold short terminal content data strings. The version in #1530 even avoids storing single char strings by "overloading" the pointer meaning. wcwidth
should be moved here.CharData
from [number, string, number, number]
to [number, number]
, where the numbers are pointers (index numbers) to:
AttributeCache
entryStringStorage
entryThe attributes are unlikely to change alot, so a single 32bit number will save much memory over time. The StringStorage
pointer is a real unicode codepoint for single chars, therefore can be used as the code
entry of CharData
. The actual string can be accessed by StringStorage.getString(idx)
. The fourth field wcwidth
of CharData
could be accessed by StringStorage.wcwidth(idx)
(not yet implemented). There is almost no runtime penalty from getting rid of code
and wcwidth
in CharData
(tested in #1529).
CharData
into a dense Int32Array
based buffer implementation. Also tested in #1530 with a stub class (far from fully functional), the final benefits are likely to be:
ls -lR /usr/lib
dropped to 1.3s (master is at 2.1s) while the old buffer is still active for cursor handling, once removed I expect the runtime to drop below 1sDownside - step 4 is quite alot work since it will need some rework on the buffer interface. But hey - for saving 80% of the RAM and still gaining runtime performance it is no biggy, is it? :smile:
There is another issue I stumbled over - the current empty cell representation. Imho a cell can have 3 states:
blankLine
and eraseChar
, but with a space as content.Problem I see here is that an empty cell is not distinguishable from a normal cell with a space inserted, both look the same at buffer level (same content, same width). I did not write any of the renderer/output code but I expect this to lead to awkward situations at the output front. Esp. the handling of the right end of a line might get cumbersome. A terminal with 15 cols, first some string output, that got wrapped around:
1: 'H', 'e', 'l', 'l', 'o', ' ', 't', 'e', 'r', 'm', 'i', 'n', 'a', 'l', ' '
2: 'w', 'o', 'r', 'l', 'd', '!', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' '
versus a folder listing with ls
:
1: 'R', 'e', 'a', 'd', 'm', 'e', '.', 'm', 'd', ' ', ' ', ' ', ' ', ' ', ' '
2: 'f', 'i', 'l', 'e', 'A', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' '
The first example contains a real space after the word 'terminal', the second example never touched the cells after 'Readme.md'. The way it is represented at buffer level makes perfect sense for the standard case to print the stuff as terminal output to the screen (the room needs to be taken anyways), but for tools that try to deal with the content strings like a mouse selection or a reflow manager it is not clear anymore, where the spaces come from.
More or less this leads to the next question - how to determine the actual content length in a line (amount of cells that contain something from left side)? A simple approach would count the empty cells from the right side, again the double meaning from above makes this hard to determine.
Proposal: Imho this is easy fixable by using some other placeholder for empty cells, e.g. a control char or the empty string and replace those in the render process if needed. Maybe the screen renderer can also benefit from this, since it might not have to handle those cells at all (depends on the way the output is generated).
Btw, for the wrapped string above this also leads to the isWrapped
problem, that is essential for a reflowing resize or a correct copy&paste selection handling. Imho we cannot remove that, but need to integrate it better than it is atm.
@jerch impressive work! :smiley:
1 Build an AttributeCache to hold everything needed to style a single terminal cell. See #1528 for an early ref counting version that can also hold true color specs. The cache can also be shared between different terminal instances if needed to saved further memory in multiple terminal apps.
Made some comments on #1528.
2 Build a StringStorage to hold short terminal content data strings. The version in #1530 even avoids storing single char strings by "overloading" the pointer meaning. wcwidth should be moved here.
Made some comments on #1530.
4 Move the shrunk CharData into a dense Int32Array based buffer implementation. Also tested in #1530 with a stub class (far from fully functional), the final benefits are likely to be:
Not totally sold on this idea yet, I think it will bite us hard when we implement reflow. It does look like each of these steps can pretty much be done in order so we can see how things go and see if it makes sense to do this once we get 3 done.
There is another issue I stumbled over - the current empty cell representation. Imho a cell can have 3 states
Here's an example of a bug that came out of this https://github.com/xtermjs/xterm.js/issues/1286, :+1: to differentiating whitespace cells and "empty" cells
Btw, for the wrapped string above this also leads to the isWrapped problem, that is essential for a reflowing resize or a correct copy&paste selection handling. Imho we cannot remove that, but need to integrate it better than it is atm.
I see isWrapped
going away when we tackle https://github.com/xtermjs/xterm.js/issues/622 as CircularList will only contain unwrapped rows.
Not totally sold on this idea yet, I think it will bite us hard when we implement reflow. It does look like each of these steps can pretty much be done in order so we can see how things go and see if it makes sense to do this once we get 3 done.
Yes I am with you (still fun to play around with this totally different approach). 1 and 2 could be cherrypicked, 3 can be applied depending on 1 or 2. 4 is optional, we could just stick to the current buffer layout. The memory savings are like this:
CircularList
: saves 50% (~2.8MB of ~5.5 MB)1. is very easy to implement, the memory behavior with bigger scrollBack will still show the bad scaling as shown here https://github.com/xtermjs/xterm.js/pull/1530#issuecomment-403542479 but on a less toxic level
2. Slightly harder to implement (some more indirections at line level needed), but will make it possible to keep the higher API of Buffer
intact. Imho the option to go for - big mem save and still easy to integrate.
3. 5% more mem save than option 2, hard to implement, will change touch all API and thus literally the whole code base. Imho more of academical interest or for boring rainy days to be implemented lol.
@Tyriar I did some further tests with rust for webassembly usage and rewrote the parser. Note that my rust skills are abit "rusty" since I did not get yet deeper into it, therefore the following might be the result of weak rust code. Results:
Unless we want to rewrite the whole core libs in rust (or any other wasm capable language) we cannot gain anything from moving to a wasm lang imho. A plus of nowadays wasm langs is the fact that most support explicit memory handling (could help us with the buffer problem), downsides are the introduction of a totally different language into a primarily TS/JS focused project (a high barrier for code additions) and the translation costs between wasm and JS land.
TL;DR xterm.js is to broadly into general JS stuff like DOM and events to gain anything from webassembly even for a rewrite of the core parts.
@jerch nice investigation :smiley:
Calls from JS into wasm create some overhead and eats all the benefits from above. In fact it was ~20% slower.
This was [the major problem for monaco going native](this https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation#_why-not-native) as well which has primarily informed my stance on (though that was with native node module, not wasm). I believe working with ArrayBuffer
s where ever possible should give us the best balance between perf and simplicity (easy of impl, barrier to entry).
@Tyriar Gonna try to come up with an AttributeStorage to hold the RGB data. Not sure yet about the BST, for the typical use case with only a few color settings in a terminal session this will be worse in runtime, maybe this should be a runtime drop-in once the colors surpass a given theshold. Also memory consumption will raise alot again, though it will still save memory since the attributes are stored only once and not along with every single cell (worst case scenario with every cell holding different attributes will suffer though).
Do you know why the current fg
and bg
256 colors value is 9 bit based instead of 8 bits? What is the additional bit used for? Here: https://github.com/xtermjs/xterm.js/blob/6691f809069a549b4808cd2e055398d2da15db37/src/InputHandler.ts#L1596
Could you give me the current bit layout of attr
? I think a similar approach like the "double meaning" for the StringStorage pointer can further save memory but that would require the MSB of attr
to be reserved for the pointer distinction and not used for any other purpose. This might limit the possibility to support further attribute flags later on (because FLAGS
already uses 7 bits), are we still missing some fundamental flags that are likely to come?
A 32 bit attr
number in the term buffer could be packed like this:
# 256 indexed colors
32: 0 (no RGB color)
31..25: flags (7 bits)
24..17: fg (8 bits, see question above)
16..9: bg
8..1: unused
# RGB colors
32: 1 (RGB color)
31..25: flags (7 bits)
24..1: pointer to RGB data (address space is 2^24, which should be sufficient)
This way the storage only needs to hold the RGB data in two 32 bit numbers while the flags can stay in the attr
number.
@jerch by the way I sent you an email, probably got eaten by the spam filter again π
Do you know why the current fg and bg 256 colors value is 9 bit based instead of 8 bits? What is the additional bit used for?
I think it's used for the default fg/bg color (which could be dark or light), so it's actually 257 colors.
https://github.com/xtermjs/xterm.js/pull/756/files
Could you give me the current bit layout of attr?
I think it's this:
19+: flags (see `FLAGS` enum)
18..18: default fg flag
17..10: 256 fg
9..9: default bg flag
8..1: 256 bg
You can see what I landed on for truecolor in the old PR https://github.com/xtermjs/xterm.js/pull/756/files:
/**
* Character data, the array's format is:
* - string: The character.
* - number: The width of the character.
* - number: Flags that decorate the character.
*
* truecolor fg
* | inverse
* | | underline
* | | |
* 0b 0 0 0 0 0 0 0
* | | | |
* | | | bold
* | | blink
* | invisible
* truecolor bg
*
* - number: Foreground color. If default bit flag is set, color is the default
* (inherited from the DOM parent). If truecolor fg flag is true, this
* is a 24-bit color of the form 0xxRRGGBB, if not it's an xterm color
* code ranging from 0-255.
*
* red
* | blue
* 0x 0 R R G G B B
* | |
* | green
* default color bit
*
* - number: Background color. The same as foreground color.
*/
export type CharData = [string, number, number, number, number];
So in this I had 2 flags; one for the default color (whether to ignore all color bits) and one for truecolor (whether to do 256 or 16 mil color).
This might limit the possibility to support further attribute flags later on (because FLAGS already uses 7 bits), are we still missing some fundamental flags that are likely to come?
Yes we want some room for additional flags, for example https://github.com/xtermjs/xterm.js/issues/580, https://github.com/xtermjs/xterm.js/issues/1145, I'd say at least leave > 3 bits where possible.
Instead of pointer data inside the attr itself, there could be another map that holds references to the the rgb data? mapAttrIdxToRgb: { [idx: number]: RgbData
@Tyriar Sorry, was a few days not online and I fear the email got eaten by the spam filter. Could you resend it please? :blush:
Played abit with more clever lookup data structures for the attrs storage. Most promising regarding space and search/insert runtime are trees and a skiplist as a cheaper alternative. In theory lol. In practice neither can outperform my simple array search which seems very weird to me (bug in the code somewhere?) I uploaded a test file here https://gist.github.com/jerch/ff65f3fb4414ff8ac84a947b3a1eec58 with array vs. a left-leaning redβblack tree, that tests up to 10M entries (which is almost the complete addressing space). Still the array is far ahead compared to the LLRB, I suspect the break-even to be around 10M though. Tested on my 7ys old laptop, maybe someone can test it as well and even better - point me to some bugs in the impl/tests.
Here are a some results (with running numbers):
prefilled time for inserting 1000 * 1000 (summed up, ms)
items array LLRB
100-10000 3.5 - 5 ~13
100000 ~12 ~15
1000000 8 ~18
10000000 20-25 21-28
What really surprises me is the fact that the linear array search does not show any growing in the lower regions at all, it is up to 10k entries stable at ~4ms (might be cache related). The 10M test shows for both a worse runtime than expected, maybe due to mem paging whatsoever. Maybe JS is to far away from the machine with the JIT and all the opts/deopts happening, still I think they cant eliminate a complexity step (though the LLRB seems to be heavy on a single n thus moving the break even point for O(n) vs. O(logn) upwards)
Btw with random data the difference is even worse.
Problem
Memory
Right now our buffer is taking up too much memory, particularly for an application that launches multiple terminals with large scrollbacks set. For example, the demo using a 160x24 terminal with 5000 scrollback filled takes around 34mb memory (see https://github.com/Microsoft/vscode/issues/29840#issuecomment-314539964), remember that's just a single terminal and 1080p monitors would likely use wider terminals. Also, in order to support truecolor (https://github.com/sourcelair/xterm.js/issues/484), each character will need to store 2 additional
number
types which will almost double the current memory consumption of the buffer.Slow fetching of a row's text
There is the other problem of needing to fetch the actual text of a line swiftly. The reason this is slow is due to the way that the data is laid out; a line contains an array of characters, each having a single character string. So we will construct the string and then it will be up for garbage collection immediately afterwards. Previously we didn't need to do this at all because the text is pulled from the line buffer (in order) and rendered to the DOM. However, this is becoming an increasingly useful thing to do though as we improve xterm.js further, features like the selection and links both pull this data. Again using the 160x24/5000 scrollback example, it takes 30-60ms to copy the entire buffer on a Mid-2014 Macbook Pro.
Supporting the future
Another potential problem in the future is when we look at introducing some view model which may need to duplicate some or all of the data in the buffer, this sort of thing will be needed to implement reflow (https://github.com/sourcelair/xterm.js/issues/622) properly (https://github.com/sourcelair/xterm.js/pull/644#issuecomment-298058556) and maybe also needed to properly support screen readers (https://github.com/sourcelair/xterm.js/issues/731). It would certainly be good to have some wiggle room when it comes to memory.
This discussion started in https://github.com/sourcelair/xterm.js/issues/484, this goes into more detail and proposes some additional solution.
I'm leaning towards solution 3 and moving towards solution 5 if there is time and it shows a marked improvement. Would love any feedback! /cc @jerch, @mofux, @rauchg, @parisk
1. Simple solution
This is basically what we're doing now, just with truecolor fg and bg added.
Pros
Cons
2. Pull text out of CharData
This would store the string against the line rather than the line, this would probably see very large gains in selection and linkifying and would be more useful as time goes on having quick access to a line's entire string.
Pros
Int32Array
Cons
3. Store attributes in ranges
Pulling the attributes out and associating them with a range. Since there can never be overlapping attributes, this can be laid out sequentially.
Pros
.flags
instead of[0]
)Cons
4. Put attributes in a cache
The idea here is to leverage the fact that there generally aren't that many styles in any one terminal session, so we should not create as few as necessary and reuse them.
Pros
.flags
instead of[0]
)Cons
5. Hybrid of 3 & 4
Pros
.flags
instead of[0]
)Cons
CharAttributes
per block?CharAttributeEntry
object6. Hybrid of 2 & 3
This takes the solution of 3 but also adds in a lazily evaluates text string for fast access to the line text. Since we're also storing the characters in
CharData
we can lazily evaluate it.Pros
.flags
instead of[0]
)Cons
Solutions that won't work
Int32Array
will not work as it takes far to long to convert the int back to a character.