helix-editor / helix

A post-modern modal text editor.
https://helix-editor.com
Mozilla Public License 2.0
31.84k stars 2.35k forks source link

Honor whitespace escape sequences in `yank-join` #10993

Open bitcrshr opened 1 month ago

bitcrshr commented 1 month ago

Say you have the following file: image

And you select just the initial asdf bits: image

Then, you enter :yank-join ,\n. I think the expected behavior would be this: image

But instead, you get this: image

This isn't a very big deal, as it's pretty easy to just go ahead with the default newline separator and add things as you wish, but it could be pretty convenient to just do it all at once.

I dug into the code a bit, and I think that the ShellWords::From<&str> implementation is to blame here. I suspect there's quite good reason for it, but I'm wondering if there might be a reasonable workaround.

I would be happy to give a shot at a PR for this, but I'm not quite sure what the implications might be for not escaping the newlines (or whatever the workaround might be) so some guidance may be needed.

Thanks a bunch :)

RoloEdits commented 1 month ago

What operating system are you on?

When trying to reproduce behavior on Windows I get asdf,\nasdf not asdf,nasdf.

I also grepped for ShellWords and nothing came up. What module would this be in? Ah, found it. Shellwords, not ShellWords.

RoloEdits commented 1 month ago

Made a pretty naive implimentation to escape given sequences. Seems to work fine? I added other a few other common patterns that might crop up for light testing. Not sure if something like this already exists as a helper function somewhere, so just hacked this one together.

fn yank_joined_impl(editor: &mut Editor, separator: &str, register: char) {
    let (view, doc) = current!(editor);
    let text = doc.text().slice(..);

    let selection = doc.selection(view.id);
    let selections = selection.len();
    let joined = selection
        .fragments(text)
        .fold(String::new(), |mut acc, fragment| {
            if !acc.is_empty() {
-                acc.push_str(separator);
+                acc.push_str(&escape(separator));
            }
            acc.push_str(&fragment);
            acc
        });

    match editor.registers.write(register, vec![joined]) {
        Ok(_) => editor.set_status(format!(
            "joined and yanked {selections} selection{} to register {register}",
            if selections == 1 { "" } else { "s" }
        )),
        Err(err) => editor.set_error(err.to_string()),
    }
}
fn escape(separator: &str) -> Cow<'_, str> {
    enum State {
        Normal,
        Escape,
    }

    let mut escaped = String::new();
    let mut state = State::Normal;
    let mut is_escaped = false;

    for (idx, ch) in separator.char_indices() {
        match state {
            State::Normal => match ch {
                '\\' => {
                    if !is_escaped {
                        // PERF: As not every separator will be escaped, we use `String::new` as that has no initial
                        // allocation. If an escape is found, then we reserve capacity thats the len of the separator
                        // as the new escaped string will be at least that long.
                        escaped.reserve(separator.len());
                        if idx > 0 {
                            // First time finding an escape, so all prior chars can be added to the new escaped version
                            // if its not the very first char found.
                            escaped.push_str(&separator[0..idx]);
                        }
                    }
                    state = State::Escape;
                    is_escaped = true;
                }
                _ => {
                    if is_escaped {
                        escaped.push(ch);
                    }
                }
            },
            State::Escape => match ch {
                'n' => {
                    escaped.push('\n');
                    state = State::Normal;
                }
                't' => {
                    escaped.push('\t');
                    state = State::Normal;
                }
                'r' => {
                    escaped.push('\r');
                    state = State::Normal;
                }
                '\\' => {
                    escaped.push('\\');
                    state = State::Normal;
                }
                _ => {
                    escaped.push('\\');
                    escaped.push(ch);
                    state = State::Normal;
                }
            },
        }
    }

    if is_escaped {
        escaped.into()
    } else {
        separator.into()
    }
}
bitcrshr commented 1 month ago

Ah, interesting. I work with a remote setup with helix running in tmux on Ubuntu, but SSHd from MacOS with Alacritty. Thanks for the code, I'll give it a shot and see what I can come up with!

RoloEdits commented 1 month ago

This kind of escaping can expand further than just newlines. Currently you cannot paste in unicode, like 🤩, in the yank-joined command buffer. But by offering a way to escape the literal \u{1f929} then you could join with this emoji even if you can't paste it in. Tabs would be another one. Can't tab in the command buffer. You could even add spaces with \u{20}, something that has no meaning currently as it gets ignored.

If you don't mind, I'll try opening up a pr that would provide a way to unescape these things in general, and then use it to unescape the separator.