Closed cessen closed 3 years ago
Thanks for the extensive writeup! I think quite a few of these are desirable:
All files have infinite, persistent undo. Even across closing the file or closing the entire editor.
This is something I'd really like to see, it's mentioned in my old TODO.md
file. I've had persistent undo on in neovim for a long time.
Files are always auto-saved.
This might be a good experiment after persistent undo, since you can always just undo the saved changes even if you accidentally closed the editor. I don't have a fully formed opinion here.
Opening a new file in a view simply closes the file that was there before.
This is similar to how I interact with helix: the view splits are there simply to hold a layout, and then I swap new files into them as I edit (it's also how I think vim should be used, but a lot of newer users fall into the anti-pattern of using a tab bar + tabs to simulate separate file tabs).
We don't close the file when swapping, but it gets put on the buffers list accessible on <space>b
. I use this primarily so I'm able to view a subset of buffers -- I guess we'd be able to merge the <space>b
view with <space>f
by giving a higher score to already open files, or by displaying them if there's no search query entered. That way there's no need to switch from b
to f
if the file wasn't already open.
Full disclosure: I wrote Led, and it's where Ropey originally came from. It's a super bare-bones editor, and is mostly useless (it doesn't even have search functionality!).
I actually saw it when researching other rust-based editors! I think it's great that you developed Ropey as a separate library. I was initially looking at xi-rope
but it seemed a lot more complex to deal with because of it's focus on CRDTs.
However, one thing it's really good at is handling huge and weird files. For example, it can open a multi-gigabyte file that's all on one line, with word-wrapping on, and navigate and edit it buttery-smooth.
I wondered how that could be done with Ropey. I remember vim had a tree of blocks abstraction that was sort of like a rope, and it would avoid reading the entire file into memory, instead paging in blocks as needed. This complicates things a little, so I've just gone with the simplest approach initially to get things rolling; that's also why there's a lack of support for other encodings and CRLF. I could definitely use some help here.
Speaking of CRLF, one common pattern in Helix is searching for the end of a specific line. We use line_to_char(line + 1)
, but that returns an index after the line terminator, we .saturating_sub(1)
so we're positioned before \n
. This obviously breaks for \r\n
, and I've also ran into edge cases where the file wasn't terminated by a new line. Wasn't sure if there's a better way to do it.
Are you interested in using Helix?
(by the way, I spotted you on Matrix but not sure if you're active there, feel free to join us at #helix-editor:matrix.org
)
(Sorry for the novel. As Mark Twain said, if I'd had more time I would have written a shorter letter. In any case, please feel free to skim, skip, and just generally read and respond only to what you want. I won't be offended!)
Are you interested in using Helix?
Tentatively, yes. It depends on the direction it ultimately goes in. That's certainly part of why I filed this issue: if a workflow similar to the one I described regarding Zed is eventually supported, I would jump on Helix very quickly. I've been seriously missing that ever since Zed went defunct.
But I was also serious when I said that I don't want to push things! This is a really cool project, and there are a lot of great ways to design and build an editor. I don't want to interfere too much just based on my own personal preferences, and there are surely other people who want other things out of Helix.
This might be a good experiment after persistent undo, since you can always just undo the saved changes even if you accidentally closed the editor. I don't have a fully formed opinion here.
Realistically, I think auto-saving is something that should be a configuration option if it's done at all. Not everyone will want it, and there are corner-cases that are always hard to get right. But it really is one of keys to the kind of low-cognitive-overhead workflow I described, so if that's something you think you'd like to support, that would be awesome.
This is similar to how I interact with helix: the view splits are there simply to hold a layout, and then I swap new files into them as I edit
Yeah, I'm totally with you there! I currently have Sublime Text configured to work as close to that as you can get it (+autosave). It doesn't quite work right, but works well enough to be worth it over the default setup.
Anyway... it seems like the list of things needed to make Helix configurable to work the way I want isn't actually that long:
<space>b
and <space>f
as you described, with the further enhancement that the buffer list would (optionally?) be sorted by most-recently-viewed.Despite the shortness of the list, that's obviously a lot of work. But in terms of things you'd like to eventually support in Helix (and might accept PRs for), do those seem like they would fit? If so, I think I'm just about ready to jump onboard with Helix! :-)
I wondered how that could be done with Ropey. I remember vim had a tree of blocks abstraction that was sort of like a rope, and it would avoid reading the entire file into memory, instead paging in blocks as needed.
Ah! I think I gave the wrong impression. I wasn't talking about opening larger-than-ram files. (For better or worse, Ropey doesn't support that.) Just very large files that do fit in ram, but have odd properties that cause most editors to choke. You don't even have to get that big, honestly.
An example is this file: 10mb_one_line.txt.zip
Unzipped, it's 10 MB of text all on one line. Led handles it with no trouble at all, but it's essentially uneditable (and frequently un-navigatable) in nearly every other editor I've tested it in. (Sublime Text might be an exception, in that it can barely scrape by with great unpleasantness, but it takes forever to load the file and can't handle much more anyway.)
This is definitely niche, though, and not actually a shortcoming of other editors given their target use-case. But it's a nice feeling when you know that your editor can handle any (fits-in-memory) file you can throw at it.
Speaking of CRLF, one common pattern in Helix is searching for the end of a specific line. We use line_to_char(line + 1), but that returns an index after the line terminator, we .saturating_sub(1) so we're positioned before \n. This obviously breaks for \r\n, and I've also ran into edge cases where the file wasn't terminated by a new line. Wasn't sure if there's a better way to do it.
A couple of notes that might be useful:
Rope
/RopeSlice
will always lack a line ending. E.g. "Hello\nworld" is two lines ("Hello\n" and "world") and "Hello\nworld\n" is three lines ("Hello\n", "world\n", and ""). You could potentially take advantage of this for a cheap special case, simplifying the the remaining code a little.Here's probably the most efficient way to do it, taking advantage of the last-line special case and checking raw byte values:
(EDIT: disclaimer, I haven't actually tested any of these, so they probably have bugs and maybe even syntax errors. For illustrative purposes only, ha ha.)
fn end_index(line: &RopeSlice, is_last: bool) -> usize {
if is_last {
line.len_chars()
} else {
if line.byte(line.len_bytes() - 1) == b'\n'
&& line.len_bytes() > 1
&& line.byte(line.len_bytes() - 2) == b'\r'
{
line.len_chars() - 2
} else {
line.len_chars() - 1
}
}
The above code takes the line as a slice and a flag saying if it's the last line or not, and returns the char index of the line's end minus the line ending itself. (You could make the API different, of course, but that's the basic idea.)
If you don't want to special-case the last line, then it's a little less wieldy (and also slower), but certainly doable:
fn end_index(line: &RopeSlice) -> usize {
let len = line.len_chars();
if len > 0 {
match line.char(len - 1) {
'\n' => if len > 1 && line.char(len - 2) == '\r' {
len - 2
} else {
len - 1
},
'\u{000B}' // (Vertical Tab)
| '\u{000C}' // (Form Feed)
| '\u{000D}' // (Carriage Return)
| '\u{0085}' // (Next Line)
| '\u{2028}' // (Line Separator)
| '\u{2029}' // (Paragraph Separator)
=> len - 1,
_ => len,
}
} else {
0
}
}
This definitely isn't obvious code, though, and I think Ropey could support this a lot better. For example, with non-panicking APIs and a char_is_line_ending
utility function, things could be nicer:
fn end_index(line: &RopeSlice) -> usize {
let i1 = line.len_chars().saturating_sub(1);
let i2 = line.len_chars().saturating_sub(2);
match (line.get_char(i2), line.get_char(i1)) {
(Some('\r'), Some('\n')) => i2,
(_, Some(c)) if char_is_line_ending(c) => i1,
_ => line.len_chars(),
}
}
This is still slower than the last-line-special-case version, but the code is (I think?) clearer, and certainly shorter.
(by the way, I spotted you on Matrix but not sure if you're active there, feel free to join us at
#helix-editor:matrix.org
)
Ah, yeah! I'll hop on. Thanks!
Ah! I think I gave the wrong impression. I wasn't talking about opening larger-than-ram files. (For better or worse, Ropey doesn't support that.) Just very large files that do fit in ram, but have odd properties that cause most editors to choke. You don't even have to get that big, honestly.
What we could do is create a similar abstraction over Ropey
and internally use a buffer of Rope
s to keep track of edits. At the very minimum, there is always one Rope
in the buffer. When edits are made, in that Rope
, we would just push it into the buffer and page the next block in a new Rope
, otherwise the old Rope
would be dropped. If two edited Rope
s are right next to each other, we could probably just merge them into one through append()
. We would also need to keep track of the start and end character indices in relation to the file. I think this is the simplest form, but there are definitely more complex yet more performant forms that we might want to investigate if we go down this route.
One alternative form I can already think of is keeping track of the character indices for the first and last edits to a Rope
, and then splitting it based on that. This would be more efficient in terms of memory, but possibly not in terms of time. This is something that would definitely need to benchmarked though.
A best of both worlds approach might just be to find the ideal capacity for each Rope
, e.g. 10mb, such that memory isn't likely to be a problem in most cases and then proceed with the steps in the first form I proposed. Then we would just create enough Rope
s to fulfill our overall "buffer capacity" that is not the actual buffer, but a cache I guess? Not sure what to call it.
Thoughts?
struct Node {
rope: Rope,
start: usize,
end: usize,
}
struct DocumentBuffer {
buffer: Vec<Node>,
// Other smart pointer stuff too
}
What we could do is create a similar abstraction over Ropey and internally use a buffer of Ropes to keep track of edits.
I think if we go that route, it would make more sense to switch to something other than Ropey. Probably something like a piece table, which is designed for that kind of thing. I've actually been kind of wanting to try my hand at writing a good piece-table implementation that can support memory mapping, etc., so I certainly wouldn't be opposed in terms of the work it would take.
However, there are a lot of complications that you run into when trying to support that kind of thing in a real editor, many of which aren't immediately obvious (and likely many of which I'm also not aware, having never actually done it myself). But off the top of my head:
None of these things are necessarily insurmountable. But my point is that there are a lot of knock-on effects beyond just the technical challenge of mapping on-disk data into memory. If anything, the memory mapping etc. is the easy part.
My personal feeling is that larger-than-memory text editing is kind of a niche problem that requires niche software. So IMHO Helix probably shouldn't bother with it. (But it would be really interesting to start another project focused on exactly that, ha ha.)
Another thing to consider is that the memory available on modern computers is (as ever) also continuing to increase. Developer systems with 32+ GB of ram aren't uncommon, and they're only going to become more common in the coming years. So there's also the question of is it worth the added technical complexity to address an already-niche problem that may well be a non-issue within the next half-decade or so.
I don't mean to be a wet towel, though. The technical challenges behind doing something like that are really, really cool and interesting. I'm just trying to think practically whether it's worth the time and technical burden to implement and maintain.
To "counter" you argument, I think there are many common situations where it's desirable to prefer streaming + chunking text rather than loading it all at once. For example, JSON, XML, and logs are especially huge. With Helix
, it takes ~100mb of memory to open a 5mb C file, and >3 gigabytes to open a 180mb JSON file, though I ended the process before it climbed higher.
Overall, I believe that this is a feature that Helix should have in the future for it to be a more robust editor, but I do agree with you that the technical challenges should be evaluated thoroughly to determine how high of a priority this feature is.
If this sounded aggressive, I apologize it wasn't my intention.
(Additionally, I think Helix's Transaction
s are its equivalent of a piece table, maybe that could be leveraged.)
With Helix, it takes ~100mb of memory to open a 5mb C file, and >3 gigabytes to open a 180mb JSON file, though I ended the process before it climbed higher.
I assume that's due to secondary structures like the tree-sitter tree, etc. When I open a 1GB plain text file in Helix it only takes 1.1GB (as I would expect of Ropey). So I guess the question is, can those secondary data structures be built from a partial chunk of the file and still function properly? Or maybe there are ways to configure them to make a different space/performance trade-off? Or maybe they can be memory-mapped too?
Either way, it sounds like the secondary structures are more in need of memory handling than the text itself. So we'd either need to solve that as well, or disable the features associated with those data structures on larger-than-ram files (which, to be honest, is kind of what I expect when editing such files anyway).
Overall, I believe that this is a feature that Helix should have in the future for it to be a more robust editor
I don't have a ton of extra time to commit to this, but if you'd be interested I would certainly enjoy trying to prototype something out with you along those lines, just to get a better sense of what hurdles we would really be facing. My gut feeling is that it will be very non-trivial when moving beyond just basic text editing functionality and actually trying to handle all the corner cases in a good way. But my gut has been wrong before.
It's fine if you don't want to, I was actually planning on trying to prototype it myself after resolving #18.
Edit: I also checked out the memory being used by Tree-Sitter after what you said, and it's definitely responsible for the high memory usage. Knowing that now, I am definitely more in agreement with your position as a whole, though I still want to prototype it. I don't think there's a way around it https://github.com/tree-sitter/tree-sitter/issues/222 other than disabling Tree-Sitter on huge files.
Knowing that now, I am definitely more in agreement with your position as a whole, though I still want to prototype it. I don't think there's a way around it tree-sitter/tree-sitter#222 other than disabling Tree-Sitter on huge files.
So, to counter my own argument (ha ha), I don't think disabling those features on large files is actually a problem per se. Most files that large are likely to be things like log files, that probably wouldn't benefit as much from those features anyway. And being able to open and edit a file at all is much better than not being able to.
I still hold my original position, but I just don't think that in particular is much of an argument in my favor. ;-)
Also, I want to be clear that I'm 100% in favor of you taking a crack at this. A lot of my reservations are based on what I suspect is true, but I don't really know. So getting some real experience with the space would be great. And even if it doesn't pan out, it may lead to other useful things for Helix or other projects, so it's of benefit either way.
So, to counter my own argument (ha ha), I don't think disabling those features on large files is actually a problem per se. Most files that large are likely to be things like log files, that probably wouldn't benefit as much from those features anyway. And being able to open and edit a file at all is much better than not being able to.
Yeah, definitely. I think if someone wants it down the line, they could try implementing a plugin for syntect
. I don't think it's a bad argument though, it is definitely uncommon to be opening huge code files.
Overall, I believe that this is a feature that Helix should have in the future for it to be a more robust editor, but I do agree with you that the technical challenges should be evaluated thoroughly to determine how high of a priority this feature is.
I think supporting extremely large files can be marked up as a non-goal for now. If you're opening a large file (200MB+) it's likely some sort of log or json, and it's unlikely you're actually editing the file but searching through it to find something. In these cases a pipeline with ripgrep/jq/tail/less etc. is probably how you should be approaching the file. I don't think it's worth the bump in complexity to solve those cases (only loading chunks into memory and so on).
I've started a gist to keep track of my personal wish-list for Helix as I'm using it. And I think all the relevant people have had a chance to read this issue. So I'm closing it.
I'll re-open some of these items as specific feature issues as needed.
@cessen https://xi-editor.io/docs/rope_science_00.html Might be a nice doc to read tbh
@Nyabinary Yes, ropey pre-dates xi-rope ;)
There also Lapce too as an editor
Yes we also know lapce, they based their rendering code on our implementation actually 😉
Anyway this is a pretty old thread that has been closed for a couple years now. Most ideas from the gist were implemented or are planned.
Helix takes a lot of inspiration from vim and kakoune, which are both great editors in their own right. I'd like to suggest a couple of other editors that I think bring some interesting ideas to the table.
However, please don't take this as a pushy "Helix should do things this way!" kind of thing--that's not how I intend it at all. I'm not specifically trying to push the project in any of these directions. Rather, these are a couple of lesser-known editors that people may not be aware of that I think might spark some useful ideas.
Zed
Zed is a now-defunct code editor, started in 2013. I used this as my primary code-editor for a long time, and the thing that really set Zed apart (in my opinion) wasn't any particular feature, but rather the thoughtfulness with which the features were selected and combined. Very much a "the whole is greater than the sum of its parts" kind of situation.
The best example of that, and the thing I loved most about Zed, really came down to these features in combination:
Individually none of these are particularly unique or special. But combined, they really add up in terms of reducing the user's cognitive overhead:
All of these things together free you from having to keep your editor's state in your head. Things like "what files do I have open?" or "what buffers are saved?" or "which files do I want to open in advance/keep open for what I'm working on right now?" etc. all just disappear, and you can focus on just coding.
This was a brand new experience for me in Zed, and I have yet to experience it in any editor since. You can sort of get a similar experience via addons in editors like Sublime Text, VSCode, etc. But it never quite works right, and never fully relieves the cognitive burden in the same way.
Again, none of these individual features are particularly special. It's just the way they were selected and combined that made Zed's UX really special.
Having said all of that, Zed punted on some things in order to achieve this experience. For example, how do you handle temporary buffers that aren't associated with a file on disk? Or what if a file is really large and takes more than an instant to load/save? Etc. Zed just didn't address any of that. Nevertheless, I think there are some really cool ideas in there about how the code editing experience can be faster and less of a cognitive burden on the user.
Led
Full disclosure: I wrote Led, and it's where Ropey originally came from. It's a super bare-bones editor, and is mostly useless (it doesn't even have search functionality!).
However, one thing it's really good at is handling huge and weird files. For example, it can open a multi-gigabyte file that's all on one line, with word-wrapping on, and navigate and edit it buttery-smooth. As far as I know, Led is unique in this ability (and I've tested a lot of other editors--certainly the well known ones).
Part of this is due to Ropey, of course. If Ropey couldn't handle it, obviously Led couldn't. But it's also because of the following two things:
The end result is an editor that, despite being almost useless in every other respect, I still find myself regularly using. Especially for text files generated by software rather than people.
If Helix is interested in being similarly robust, I'm happy to walk through how Led accomplishes this (and possibly help implement in Helix as well). The approach is absolutely compatible with a full-featured editor like Helix. But I also realize that editing weird files like that isn't Helix's primary use-case, so no worries either way.