Benchmarks, and 'interface' traits?

I've been beavering away learning rust by working on omnikron13/braid, another generic rope implementation.

I'm wondering what benchmarks you might have that would be worth stealing? I already lifted more or less any of the ones from cessen/ropey that I was able to apply to my codebase. =P

Although useful, they don't seem to blend in much unicode, and they're definitely pretty artificial, being composed of Lorem Ipsum text (v.s. for example source code, presumably a primary target for a rope implementation). I'd be very interested, and much appreciative, for extra and perhaps more 'realistic' benchmarks you might be able to propose?

On an only partially related note... Do you think it would be worth attempting to abstract out any 'interface' traits related to accessing these string+ data types? I've been thinking of maybe trying some other 'text editor buffer' type string implementations too at some point, and looking at the few ropes that exist already and such, I wonder if trying to standardise something for plug-n-play use, and benchmarking efforts, might work? Idk, just an idea, lemme know your thoughts.

Although useful, they don't seem to blend in much unicode [..] being composed of Lorem Ipsum text

When benchmarking it really doesn't matter what the contents of the text you're editing are, you're usually only interested in figuring out how long each rope.insert or rope.delete takes. The main variable is how long the text is, not what it is. A wall of Lorem Ipsums is just as good of a data set as a bunch of emojis.

they're definitely pretty artificial

Yes, but that's because of the editing pattern. The positions of the insertions and deletions in those benchmarks are usually randomly generated: you insert a character here, then delete some there, and so on.

This is not how humans actually edit text. We usually insert or delete entire runs of characters before moving the cursor to another place in the document. We might jump around occasionally, but most of the edits are localized.

If a rope is designed with this assumption in mind (and both crop and Jumprope are) it's nice to have benchmarks that don't make you ping-pong around.

Luckily the author of Automerge (a CRDT library) and the author of Jumprope recorded several character-by-character editing traces of them typing out a document in real time. You can find those traces here.

I wonder if trying to standardise something for plug-n-play use, and benchmarking efforts, might work?

I already did something along these lines in this repo. There you can find a bunch of benchmarks, including those editing traces I mentioned. You can clone that repo, add your rope to the dependencies, implement the Rope trait on it and add it to this list. Then just run cargo run --release -- --bench <your_rope>.

Although useful, they don't seem to blend in much unicode [..] being composed of Lorem Ipsum text

When benchmarking it really doesn't matter what the contents of the text you're editing are, you're usually only interested in figuring out how long each rope.insert or rope.delete takes. The main variable is how long the text is, not what it is. A wall of Lorem Ipsums is just as good of a data set as a bunch of emojis.

Is that strictly true, given that rust strings are stored as vectors of bytes/u8, encoded as UTF-8? A string entirely composed of ASCII characters could be indexed just by the byte position in O(1) time, but a string composed of a completely random mix of characters would require checking every character width to compute the indices and result in O(n) time, surely?

I've personally been playing around with indexing various things about the character composition of the strings (character widths, positions of newlines, etc.) to improve performance of certain operations, but this then does mean the distribution affects the actual times.

It's in testing how successful strategies I'm experimenting with here that purely ASCII benchmarks (or ones with a non-representative distribution of newlines) aren't likely to give me particularly useful results.

they're definitely pretty artificial

Yes, but that's because of the editing pattern. The positions of the insertions and deletions in those benchmarks are usually randomly generated: you insert a character here, then delete some there, and so on.

This is not how humans actually edit text. We usually insert or delete entire runs of characters before moving the cursor to another place in the document. We might jump around occasionally, but most of the edits are localized.

If a rope is designed with this assumption in mind (and both crop and Jumprope are) it's nice to have benchmarks that don't make you ping-pong around.

Hm, might this assumption be missing some things? I've not thought too deeply on this yet, but alongside adding or removing a line (for example), operations like find/replace which could be operating across an entire document aren't all that uncommon, just as an example.

I'm not sure how best to create benchmarks/test data to cover different types of edits, so that's just something I'm pondering on...

Luckily the author of Automerge (a CRDT library) and the author of Jumprope recorded several character-by-character editing traces of them typing out a document in real time. You can find those traces here.

I wonder if trying to standardise something for plug-n-play use, and benchmarking efforts, might work?

I already did something along these lines in this repo. There you can find a bunch of benchmarks, including those editing traces I mentioned. You can clone that repo, add your rope to the dependencies, implement the Rope trait on it and add it to this list. Then just run cargo run --release -- --bench <your_rope>.

Ah, I think these are exactly the sort of thing I'm thinking about, thanks. =)

I may try modifying this to use some rough unicode test files too, actually...

A string entirely composed of ASCII characters could be indexed just by the byte position in O(1) time, but a string composed of a completely random mix of characters would require checking every character width to compute the indices and result in O(n) time

Yes, but that time complexity is set at "design" time, not at runtime. If you design your rope to work with byte offsets then indexing is O(1), Unicode or not. It's the user's responsibility to make sure that the byte offsets they provide are valid. And if you instead design it around char offsets then it's always O(n), ASCII or not.

As an aside, I would advise against choosing codepoints as the indexing metric. Like you said it brings a performance penalty over bytes, and it doesn't provide any advantages over it. The fundamental unit of text in a real editor is not a Unicode code point, it's much closer to an extended grapheme cluster. Imo bytes and ECGs are the only two sensible metrics to choose from, and (for reasons that I won't go over here) using graphemes doesn't work.

Hm, might this assumption be missing some things? I've not thought too deeply on this yet, but alongside adding or removing a line (for example), operations like find/replace which could be operating across an entire document aren't all that uncommon, just as an example.

Well it's a general rule-of-thumb, not a theorem ;) There are definitely cases where it breaks down, e.g. find&replace and multiple cursors, but it's still a useful insight. As a piece of anecdotal evidence supporting this, after reimplementing crop's leaves as gap buffers instead of simple Strings I saw performance improvements of 10-20% on the editing traces.

I'm not sure how best to create benchmarks/test data to cover different types of edits

The traces already contain find&replace and all other kinds of edits, so I wouldn't worry about that. Of course you could record your own trace if you really wanted to. I think there's a vscode plugin for it but I'm not sure.

nomad / crop

Benchmarks, and 'interface' traits? #10