kamadorueda / alejandra

The Uncompromising Nix Code Formatter
https://kamadorueda.github.io/alejandra/
The Unlicense
827 stars 40 forks source link

Function arguments not indented correctly #221

Open DavHau opened 2 years ago

DavHau commented 2 years ago

I have re-run alejandra against my code-base and have to say: This formatter is simply amazing. Thanks for putting in the effort and not getting tired by all the bikeshedding.

Another thing I noticed is that whenever function arguments are listed horizontally, the way alejandra changes the indentation doesn't feel right.

this

{
  foo =
    someFunction
      arg1
      arg2
      arg3;
}

results in this:

{
  foo =
    someFunction
    arg1
    arg2
    arg3;
}

This doesn't look right to me. For this simple example it might be irrelevant, but I think for reading complex code it is important that arguments passed to a function are indented relative to the function name.

A more complex example where it gets clear why I find the current behavior undesirable:

input:

''
  otherModules=${
    pkgs.writeText "other-modules.json"
    (l.toJSON
      (l.mapAttrs
        (pname: subOutputs:
          let
            pkg = subOutputs.packages."${pname}".overrideAttrs (old: {
              buildScript = "true";
              installMethod = "copy";
            });
          in
            "${pkg}/lib/node_modules/${pname}/node_modules")
        outputs.subPackages))
  }
''

From the alignment it is obvious, that there is one argument passed to toJSON and two arguments passed to mapAttrs.

alejandra output:

''
  otherModules=${
    pkgs.writeText "other-modules.json"
    (l.toJSON
    (l.mapAttrs
    (pname: subOutputs: let
      pkg = subOutputs.packages."${pname}".overrideAttrs (old: {
        buildScript = "true";
        installMethod = "copy";
      });
    in "${pkg}/lib/node_modules/${pname}/node_modules")
    outputs.subPackages))
  }
''

Indentation wise, it is not obvious anymore which lines/blocks are an arguments for toJSON and which are arguments for mapAttrs.

kamadorueda commented 2 years ago

Thank you, there are many things going on here, let's improve step by step

I believe I finally found the quasi-perfect heuristic for tight vs loose parentheses:

kamadorueda commented 2 years ago

Using the new parenthesis heuristics the code is looking better:

''
  otherModules=${
    pkgs.writeText "other-modules.json"
    (
      l.toJSON
      (
        l.mapAttrs
        (pname: subOutputs: let
          pkg = subOutputs.packages."${pname}".overrideAttrs (old: {
            buildScript = "true";
            installMethod = "copy";
          });
        in "${pkg}/lib/node_modules/${pname}/node_modules")
        outputs.subPackages
      )
    )
  }
''

There are some things to improve yet, but now we can at least tell who is argument of who

DavHau commented 2 years ago

Oh then I think i misunderstood your description of the heuristic in the earlier post. I'm not sure if this is an optimal result. It wastes 4 lines with opening and closing parentheses. This change goes against the last improvement done by you in #63. Maybe I'm wrong, but I think it's unnecessary to waste new lines for opening/closing brackets, as I tried explaining in https://github.com/kamadorueda/alejandra/issues/63#issuecomment-1039844679

kamadorueda commented 2 years ago

One step at a time my friend

B4dM4n commented 2 years ago

I agree, the code with the new parentheses rules is much more readable, but argument indentation should also be present without parentheses.

All function arguments that do not start on the same line as the function should be indented one level more than the "parent" element.

So this

function
argument

should look more like

function
  argument

But it gets tricky once multiline arguments are involved.

Once a simple argument follows a multiline argument, all arguments including the multiline argument should either:

Variant 1 - Indent

--- current.nix
+++ expected.nix
@@ -1,17 +1,17 @@
 {
   name1 = function arg {
     asdf = 1;
   };

   name2 = function arg {
-    asdf = 1;
-  }
-  argument;
+      asdf = 1;
+    }
+    argument;

   name3 = function arg {
-    asdf = 1;
-  } {
-    qwer = 12345;
-  }
-  argument;
+      asdf = 1;
+    } {
+      qwer = 12345;
+    }
+    argument;
 }

Variant 2 - Move to new line

--- current.nix
+++ expected.nix
@@ -1,17 +1,19 @@
 {
   name1 = function arg {
     asdf = 1;
   };

-  name2 = function arg {
-    asdf = 1;
-  }
-  argument;
+  name2 = function arg
+    {
+      asdf = 1;
+    }
+    argument;

-  name3 = function arg {
-    asdf = 1;
-  } {
-    qwer = 12345;
-  }
-  argument;
+  name3 = function arg
+    {
+      asdf = 1;
+    } {
+      qwer = 12345;
+    }
+    argument;
 }

I currently would prefer variant 1, because it gives the user the option on which line the multiline argument starts.

DavHau commented 2 years ago

But it gets tricky once multiline arguments are involved.

The reason it is tricky is because the formatting is not optimal. Allow me to demonstrate why.

compare this (your variant 1):

 {
   name1 = function arg {
     asdf = 1;
   };

   name2 = function arg {
      asdf = 1;
    }
    argument;

   name3 = function arg {
      asdf = 1;
    } {
      qwer = 12345;
    }
    argument;
 }

... to this:

{
  name1 =
    function
      arg
      {asdf = 1;};

  name2 =
    function
      arg
      {asdf = 1;}
      argument;

  name3 =
    function
      arg
      {asdf = 1;}
      {qwer = 12345;}
      argument;
}

With the second formatting, it is clear on first glance, that there are 3 layers of nesting: variable assignment <-- function call <-- function arguments.

You can also figure (within 1 second or less):

For the first code snippet, all of these points are harder to tell, and require reading and understanding the code more. The main problem with snippet 1 is that elements which have a relation to each other are spread both horizontally and vertically and are not aligned consistently. The human eye is not good at processing several dimension at once quickly.

The rules that leads to the second example are:

I think, applying these 3 rules makes nix code significantly more readable and the whole discussion about where to put brackets becomes much simpler. Each element has a very clear place where it belongs. Oh, and the snippets are missing an example with a multi line attrset. So here it is:

{
  name4 =
    function
      arg
      {asdf = 1;}
      {
        qwer = 12345;
        qwer2 = 54321;
      }
      argument;
}
B4dM4n commented 2 years ago

With the second formatting, it is clear on first glance, that there are 3 layers of nesting: variable assignment <-- function call <-- function arguments.

Agreed, compared to my suggestion, which was an incremental improvement to the current layout, your suggestion is very clear and easy to comprehend.

But I also see a risk of blowing up code so much, that many lines will only contain a single word and many spaces. This probably makes it easier to read the code on mobile devices, but regular PC monitors will look very empty with a long scrollbar, which can make it harder to reason about bigger files.

  • always start multi-line statements at a new line. (never in the same line as the =)
  • elements which are similar to each other (like func args) , are listed either vertically or horizontally, not both

Maybe there should be a few exceptions to these rules, to avoid the issue above. I'm thinking about NixOS modules which make heavy usage of functions with multiline arguments, e.g. mkOption or mkIf. Or the nixpkgs trivial builders, like writeTextFile. Forcing these functions onto multiple lines with one or two additional indentation levels compared to the current style would introduce a considerable amount of whitespaces.

Reading a mkIf like this is very comprehensible to me, but maybe I just got used to function calls like this.

{
  config = mkIf config.service.name.enable {
    ...
  };
}

So an exception rule could be, keep single line function calls with an optional multiline argument at the end as is.

When the user deems such a function invocation as too complex, any newline inserted before or after the function name would cause alejandra to fallback to your rules and indent the whole invocation.

DavHau commented 2 years ago

So an exception rule could be, keep single line function calls with an optional multiline argument at the end as is.

When the user deems such a function invocation as too complex, any newline inserted before or after the function name would cause alejandra to fallback to your rules and indent the whole invocation.

That sounds like a good idea. But what about the case when there are more arguments coming after the multi-line argument?

I think, only option3 here is nice:

{
  # this is not good (`lastArg` not indented relative to `name4`)
  option1 = function arg {asdf = 1;} {
    qwer = 12345;
    qwer2 = 54321;
  }
  lastArg;

  # This is also weird
  option2 = function arg {asdf = 1;} {
      qwer = 12345;
      qwer2 = 54321;
    }
    lastArg;

  # But this is OK:
  option3 = function arg {asdf = 1;}
    {
      qwer = 12345;
      qwer2 = 54321;
    }
    lastArg;
}
B4dM4n commented 2 years ago

But what about the case when there are more arguments coming after the multi-line argument?

I don't think additional arguments after the multi-line argument should be covered by the exception rule and therefore fallback to the three rules (like the name4 variant).

Only when there is one multi-line argument and it is the last one should the config = mkIf style from my last post be used.

--- current.nix
+++ expected.nix
@@ -1,42 +1,57 @@
 {
   example1 = function arg {asdf = 1;} {
     qwer = 12345;
     qwer2 = 54321;
   };

   # not all arguments start on the same line
-  example2 = function arg {asdf = 1;}
-    {
-      qwer = 12345;
-      qwer2 = 54321;
-    };
+  example2 =
+    function
+      arg
+      {asdf = 1;}
+      {
+        qwer = 12345;
+        qwer2 = 54321;
+      };

   # same as example2
-  example3 = function
-    arg {asdf = 1;}
-    {
-      qwer = 12345;
-      qwer2 = 54321;
-    };
+  example3 =
+    function
+      arg
+      {asdf = 1;}
+      {
+        qwer = 12345;
+        qwer2 = 54321;
+      };

   # multiple multi-line arguments
-  example4 = function arg {
-      qwer = 12345;
-      qwer2 = 54321;
-    } ''
-      text
-    '';
+  example4 =
+    function
+      arg
+      {
+        qwer = 12345;
+        qwer2 = 54321;
+      }
+      ''
+        text
+      '';

   # additional new-line before function name
   example5 =
-    function arg {asdf = 1;} {
-      qwer = 12345;
-      qwer2 = 54321;
-    };
+    function
+      arg
+      {asdf = 1;}
+      {
+        qwer = 12345;
+        qwer2 = 54321;
+      };

   # argument after the multi-line argument
-  example6 = function {
-    qwer = 12345;
-    qwer2 = 54321;
-  } arg;
+  example6 =
+    function
+      {
+        qwer = 12345;
+        qwer2 = 54321;
+      }
+      arg;
 }
kamadorueda commented 2 years ago

@B4dM4n @DavHau Thank you, I believe we finally converged into a good design

Below you'll find how Alejandra @ https://github.com/kamadorueda/alejandra/commit/2842bdc2392ea12196684314bc0b7eccd2c39277 formats the code snippets you posted here

I believe we all can be happy with this

[
  (a
    b)

  ''
    otherModules=${
      pkgs.writeText "other-modules.json"
      (l.toJSON
        (l.mapAttrs
          (pname: subOutputs: let
            pkg = subOutputs.packages."${pname}".overrideAttrs (old: {
              buildScript = "true";
              installMethod = "copy";
            });
          in "${pkg}/lib/node_modules/${pname}/node_modules")
          outputs.subPackages))
    }
  ''

  {
    name1 =
      function
      arg
      {asdf = 1;};

    name2 =
      function
      arg
      {asdf = 1;}
      argument;

    name3 =
      function
      arg
      {asdf = 1;}
      {qwer = 12345;}
      argument;

    name1 = function arg {
      asdf = 1;
    };

    name2 =
      function arg {
        asdf = 1;
      }
      argument;

    name3 =
      function arg {
        asdf = 1;
      } {
        qwer = 12345;
      }
      argument;

    name4 =
      function
      arg
      {asdf = 1;}
      {
        qwer = 12345;
        qwer2 = 54321;
      }
      argument;

    option1 =
      function arg {asdf = 1;} {
        qwer = 12345;
        qwer2 = 54321;
      }
      lastArg;

    option2 =
      function arg {asdf = 1;} {
        qwer = 12345;
        qwer2 = 54321;
      }
      lastArg;

    option3 =
      function arg {asdf = 1;}
      {
        qwer = 12345;
        qwer2 = 54321;
      }
      lastArg;
  }
]

The only piece of feedback I don't think is right to implement as the moment is this:

{
  foo =
    someFunction # Indent arguments of this function
      arg1
      arg2
      arg3;
}

I'll have it loaded in RAM, low priority for now

tomberek commented 2 years ago

Would

{
  foo = someFunction
    arg1
    arg2
    arg3;
}

be consistent? It would separate the function from the args, reduce newline and indents that don't add meaning. If the "someFunction" is too long of an expression then one would switch to the Tall representation.

Also perhaps related is the common idiom of merging extra attributes into an attrset like this

{
  foo = someFunction {
    a = 2;
    b = 3;
  } // {};   #<--- add/remove the "// {} ;" or place some content into it.
}

When modifying, adding or removing that update operator the "someFunction" jumps up and down in surprising ways.

adrian-gierakowski commented 2 years ago

amen to this: https://github.com/kamadorueda/alejandra/issues/221#issuecomment-1056801192 @DavHau

it's amazing to see someone else articulate something you've arrived yourself at over the years :D

Do you have any opinion regarding @tomberek's suggestion:

Would

{
  foo = someFunction
    arg1
    arg2
    arg3;
}

be consistent?

adrian-gierakowski commented 2 years ago

@tomberek

you can do the following the prevent the "jumping" when adding/removing things:

{
  foo =
    someFunction
    {
      a = 2;
      b = 3;
    }
    // {}; #<--- add/remove the "// {} ;" or place some content into it.
}

It also makes removing easier since you can simply place your cursor on the line and press a "delete line" keyboard shortcut (well, you'd need this to be implemented first, since currently that would remove the semicolon and break the code)

adrian-gierakowski commented 2 years ago

The only piece of feedback I don't think is right to implement as the moment is this:

{
  foo =
    someFunction # Indent arguments of this function
      arg1
      arg2
      arg3;
}

I'll have it loaded in RAM, low priority for now

@kamadorueda do you not think this to b valid suggestion or you simply have other things to do which you consider higher priority?