Pipe handles newlines improperly

stevengubler commented 4 years ago

Steps

Write some string into the buffer. Select the entire string, but do not include the newline character at the end of it. Use | in normal mode to pipe the selection into an external program; base64 works well here.

Issue 1: Copy that entire string into some separate base64 decoder [like CyberChef](https://gchq.github.io/CyberChef/#recipe=From_Base64('A-Za-z0-9%2B/%3D',true)Add_line_numbers()) and decode it.

Issue 2: Back in Kakoune, select your string again, (make sure to not include a new line). Pipe your selection through base64 --decode.

Outcome

With Issue 1, it is clear to see that there is an extra new line included. This is not expected.

With Issue 2, while it is clear that the encoded string contains a newline (as per Issue 1), no new line is included in the decoded string.

Expected

With Issue 1, I would expect that no newline character would be appended to my selection string, and it should string should be exactly passed to the shell command's input stream.

With Issue 2, I would expect that any newline characters included at the end of the output stream to be accounted for and recorded in the buffer.

To add some color to this, I was using Kakoune to edit some k8s resources, some of which were secrets. In k8s, the string data of a secret is encoded with base64. For example:

apiVersion: v1
kind: Secret
metadata:
  name: MySecret
data:
  THE_SECRET: bXkgc25lYWt5IHNlY3JldA==

Withing Kakoune, I decoded a string to edit it, and then reencoded it; this resulted in a sneaky and unexpected newline character being included with my secret's value. I spent a good deal of time running around trying to figure out why the secret was not what it was showing up to be in my editor, and eventually found this issue.

Delapouite commented 4 years ago

Screwtapello commented 4 years ago

Are there many apps that misbehave if stdin reaches EOF without a trailing newline?

stevengubler commented 4 years ago

@Screwtapello Commands like base64, wc, gpg, diff, comm, and tail are definitely affected, and commands like grep, head, sed, and awk are potentially affected, to name a small bunch.

I definitely understand why some whitespace manipulation happens for general usability's sake, but this breaks my primary usecase for the piping feature. An option or method for disabling that manipulation behavior would be greatly appreciated.

mawww commented 4 years ago

I agree this needs to be fixed somehow, the general issue is that many commands, like base64, append an EOL to their output, but we dont want that newline to get inserted when we pipe some text that does not end-up with a newline. The current logic does the following: if input text does not end with EOL, append an EOL then remove it from the pipe output text.

Maybe the solution is to simplify the existing logic not-to add a newline when its missing, but still remove a newline from the output if the original text did not end with one ?

I wonder what we would break if we stopped doing that, here is a diff that does that:

diff --git a/src/normal.cc b/src/normal.cc
index beb880f9..33256c0b 100644
--- a/src/normal.cc
+++ b/src/normal.cc
@@ -579,9 +579,7 @@ void pipe(Context& context, NormalParams)
                     const auto end = changes_tracker.get_new_coord_tolerant(sel.max());

                     String in = buffer.string(beg, buffer.char_next(end));
-                    const bool insert_eol = in.back() != '\n';
-                    if (insert_eol)
-                        in += '\n';
+                    const bool ends_with_eol = in.back() == '\n';

                     // Needed in case we read selections inside the cmdline
                     context.selections_write_only().set({keep_direction(Selection{beg, end}, sel)}, 0);
@@ -590,12 +588,9 @@ void pipe(Context& context, NormalParams)
                         cmdline, context, in,
                         ShellManager::Flags::WaitForStdout).first;

-                    if (insert_eol)
-                    {
-                        in.resize(in.length()-1, 0);
-                        if (not out.empty() and out.back() == '\n')
-                            out.resize(out.length()-1, 0);
-                    }
+                    if (not ends_with_eol and not out.empty() and out.back() == '\n')
+                        out.resize(out.length()-1, 0);
+
                     auto new_end = apply_diff(buffer, beg, in, out);
                     if (new_end != beg)
                         new_sels.push_back(keep_direction({beg, buffer.char_prev(new_end), std::move(sel.captures())}, sel));

I'll run with that for a while see if I can spot any issue.

mawww commented 4 years ago

So, one issue I got so far with that patch is with bc:

 $ echo -n '1+1' | bc
(standard_in) 1: syntax error

For some reason it really wants an end-of-line, however seems busybox bc is happy with the missing newline, so maybe this can be considered a bug in GNU bc.

lenormf commented 4 years ago

According to my GNU/bc manpage:

In bc statements are executed "as soon as possible." Execution happens when a newline in encountered and there is one or more complete statements. Due to this immediate execution, newlines are very important in bc. In fact, both a semicolon and a newline are used as statement separators. An improperly placed newline will cause a syntax error.

I guess users who know about that could pipe 1+1;.

But to play the advocate of the conservative side here, why does the editor have to pick one way or the other?

I understand we're trying to make the default behaviour as convenient as possible to everybody, but since there are cases where a trailing newline is important to the shell program, why make the editor take the decision whether to include/strip it arbitrarily?

Could we not let the | primitive pipe the text to a tool, as-is, and document a snippet/script in the wiki that handles fancy side-effects that modify the input? After all, users who don't want the input to be modified will have to do that to correct the editor's behaviour.

mawww commented 4 years ago

Thats what I'd like to get to here, stop inserting that extra newline, reading the bc grammar at https://pubs.opengroup.org/onlinepubs/9699919799/utilities/bc.html however makes it quite clear that the newline is required, so at least for this use case, it would mean that writing a mathematical expression, selecting it and piping to bc would now fail due to the missing newline.

I think we still want to strip the output trailing newline if the input did not have one, because almost every filter will output a final newline.

stevengubler commented 4 years ago

Hmmm...

On one hand, I like @lenormf's idea that | should be a simple tool. Using that as a primitive, more complex behavior could be implemented on top. But while that's a logically satisfying solution, I don't think it's great for usability. Having the more useful (generally) command only available through some other composition is not great for an efficient workflow.

So what about just making it more clear to the user what is happening with trailing newlines and offer them some kind of alternative? Maybe something like | pipes to a command like normal, and <a-|> does an exact-piping of sorts?

If we can make it sufficiently clear to the user which command has which behavior, I think that would work.

mawww commented 4 years ago

The issue is that we already have <a-|> for pipe-to (pipe without replacing buffer text with the output), so we would end-up with combinatorial explosion if we were to add alternate behaviour.

I am leaning towards not adding the trailing newline and break GNU bc, this can be addressed by piping into something else, say { cat; echo; } | bc. Its still not very satisfying, but it seems the most "correct" solution, its unfortunate POSIX bc does not mandate it to handle that case.

stevengubler commented 4 years ago

we would end-up with combinatorial explosion if we were to add alternate behaviour.

Fair point. It would soon get to the point where a user would rather use the most versatile implementation and manually manage the alternate behavior externally, (similar to your example with { cat; echo; } | bc), rather than remember all of the specific usecases for the different implementations within the editor.

On another note, I'm not as worried about stripping newlines when output is inserted into the editor, since that behavior is more easily visible to the user. However we do run the risk of losing new lines mysteriously if the newline is an important part of the output, but I don't think that's particularly common. Again, base64 is an example of where that could be an issue, especially since recognizing that a newline has been lost is difficult without external verification.

But as I said though, this issue doesn't worry me as much. Thoughts, @mawww?

AvdN commented 2 years ago

Would it be possible to the no-add-newline behaviour as default, and have a configuration setting that allows certain command (i.e. first word after | ) that allows adding and/or stripping of the newline, with some explicit setting for doubling or not if the original buffer ends in a newline? That way you could configure using bc to add the extra newline.

I was testing out |testfmt dumping the input handed to testfmt to a log file and was surprised to get the same input for selecting text with and without newline, and not being able to distinguish between them

Screwtapello commented 2 years ago

It occurs to me that if Kakoune always piped data exactly as-is, it would be easy to write wrapper scripts to automatically add/strip a trailing newline where needed. However, since Kakoune does the fix-ups itself, it's not possible for wrapper scripts to restore the original content.

stevengubler commented 2 years ago

I wonder if it is possible to leverage the hook system to enable both use cases nicely... Here's how I figure:

Usecase 1: `bc`

Kakoune recieves pipe command on the selected text
Kakoune runs a pre-pipe hook on the selected text, setting the state that tracks whether or not the text had an EOL at the end.
Kakoune sends the text to the pipe and receives it back.
Kakoune runs a post-pipe hook on the returned text, removing new line if one was returned and the previous text had no EOL at the end.

Usecase 2: `base64`

User disables Kakoune from running hooks by first inputing \
Kakoune recieves pipe command on the selected text
Kakoune sends the text to the pipe and receives it back.

This seems to me like the best of both worlds: The default behavior modifies output for the better in most situations, while also using already-built functionality to enable finer control over the output. This also allows the user to configure it as they see fit for whatever use case they have.

The question I have is whether the hooks system is able to tap into the piping system. I'm not familiar enough with the code to answer that yet, but someone here might be able to speak to that.

alexherbo2 commented 2 years ago

Kakoune still eats the newline character after piping, e.g. |printf 'foo'<ret> and |printf 'foo\n'<ret> give the same result.

krobelus commented 2 years ago

On Tue, Feb 08, 2022 at 02:46:05AM -0800, Taupiqueur wrote:

Kakoune still eats the newline character after piping, e.g. |printf 'foo'<ret> and |printf 'foo\n'<ret> give the same result.

That's expected. This issue is only about not corrupting pipe input.

Kakoune trims a single newline from pipe output if the input selection does not end in a newline. This works pretty well in practice for things like |grep, that print a newline even if the input doesn't have one. If you really need this newline, you can add an extra one, or use a sentinel character.

mawww / kakoune