macisamuele / language-formatters-pre-commit-hooks

Collection of custom pre-commit hooks.
Apache License 2.0
115 stars 58 forks source link

YAML inconsistent indent and offset values #154

Closed fmigneault closed 1 year ago

fmigneault commented 1 year ago

Considering the following valid YAML (this is the result I would like to achieve after auto-fix by pretty-format-yaml):

services:
  svc:
    image: "..."
    volumes:
      - "/tmp:/tmp"

Using

--indent=4
--offset=2

causes the auto-fix of the file to be over-indented as follows:

services:
    svc:  # all nested mappings over-indented
        image: "..."
        volumes:
          - "/tmp:/tmp"  # indent as expected here

while

--indent=2
--offset=0

results in:

services:
  svc:  # nesting indents preserved
    image: "..."
    volumes:
    - "/tmp:/tmp"   # de-indented

It seems that the following should be valid:

--indent=2
--offset=2

but it is blocked by the following check:

https://github.com/macisamuele/language-formatters-pre-commit-hooks/blob/785e2cfa144d495a5d11679125627fc9ad6ff988/language_formatters_pre_commit_hooks/pretty_format_yaml.py#L76-L82

fmigneault commented 1 year ago

@macisamuele Is it possible to get an update regarding this? This breaks linters for all my repos using the desired format.

macisamuele commented 1 year ago

I will try to prioritise the checks of this issue , but I'm not sure I'll be able to fully check it before the weekend.

Something very helpful would be to have the version of the library you are using to reproduce the issue

datalogics-kam commented 1 year ago

I think what would help is if the sequence parameter could be set independently here.

Then --indent=2 --sequence=4 might do the trick.

(Based on trying another hook that is based on ruamel.yaml, but unfortunately doesn't run on Windows)

fmigneault commented 1 year ago

@macisamuele Here is a sample. The version before changes (-) is what I would expect to be considered valid with --indent=2 --offset=2, but that isn't allowed. The version after change (+) is what I must adjust with --indent=2 --offset=0 to have the validation succeed temporarily.

version: "3.9"
services:
   worker:
     env_file:
-      - .env
+    - .env
     volumes:
       # Adds current directory as volume.
-      - .:/app/src/
+    - .:/app/src/
     command:
-      - celery
-      - --app
-      - "catalog.tasks.run:broker"
-      - worker
-      - -E
-      - -B
-      - -c 5
+    - celery
+    - --app
+    - "catalog.tasks.run:broker"
+    - worker
+    - -E
+    - -B
+    - -c 5

I'm able to reproduce the desired behaviour using the following:

import sys
import yaml
import ruamel.yaml 

data = yaml.safe_load("<the yaml">)
y = ruamel.yaml.YAML()
y.indent(mapping=2, sequence=4, offset=2)
y.dump(data, sys.stdout)
version: '3.9'
services:
  worker:
    env_file:
      - .env
    volumes:
      - .:/app/src/
    command:
      - celery
      - --app
      - catalog.tasks.run:broker
      - worker
      - -E
      - -B
      - -c 5
macisamuele commented 1 year ago

Offset value has to be at least 2 bigger than general indentation to prevent rendering invalid YAML content. (Once on a laptop I'll provide a clear example of it)

The additional part of the report, if I understand it properly, is that there is the desire to have sequences indented more than the mapping key (by splitting indent argument is 2 separate arguments for mapping and sequences).

As far as technically possible, I would see a more involved set of edge cases to cover (between the 3 separate fields) and it would go beyond the original idea of the library.

The main library goal is to provide a stable formatting of the output, such that bikeshedding over style would be removed.

Considering the library intent and my understanding of the issue I'm not inclined to accommodate it in order to keep a simple and predictable formatting of the files.

If you believe that I'm misrepresenting the asks please help me fill the gaps.

If no further communication will be provided I'll be closing the issue.

datalogics-kam commented 1 year ago

Stable formatting is important, but so is readability, and IDE compatibility of the formatted result. And as commented above, compatibility with certain other linters.

Indenting list items makes it easy to distinguish keys from their list contents by eye. This is important for any nontrivial YAML file.

To make it more convenient to edit files in an IDE or editor, it's important to be able to set the IDE settings to match the format, but unfortunately, JetBrains IDEs' YAML formatter can't be set to keep YAML files this flat.

Opinionated formatting (which is what this is as it is) can be a good thing, but it helps if the opinions are widely accepted. Until one of my coworkers used this to reformat some of our code, I hadn't ever seen any YAML files that didn't indent list items under a dictionary key.

In the end, it's your repo, and you do what you want with it, but it's unfortunate that the formatting couldn't be more flexible.

macisamuele commented 1 year ago

I would agree with the statement that it is important to reduce friction, but without a clear understanding of what tools/workflows could be impacted it is hard to make an effective judgemental call.

Furthermore, I've checked a bit more the ruamel.yaml arguments:

Using

map:
  list_field:
  - sub_field1: 1
    sub_field2: 2

as reference I see the followings

If I were to factor all of them together and the intentions of #143 this is effectively a bug of the initial implementation :)

The setup of the original implementation was

    yaml.indent(mapping=args.indent, sequence=args.indent, offset=args.offset)

while it should have accounted for the offset within the sequence definition as in

    yaml.indent(mapping=args.indent, sequence=args.indent + args.offset, offset=args.offset)

With the fix we would have the following snippet considered pretty with the default arguments

map:
  list_field:
  - sub_field1: 1
    sub_field2: 2

and the following snippet considered pretty with --offset=2 value

map:
  list_field:
    - sub_field1: 1
      sub_field2: 2

Fixing the effective bug should provide a better experience and cooperation with other linters, auto-formatters, etc.

@fmigneault , @datalogics-kam Would it address the needs?

fmigneault commented 1 year ago

I agree with @datalogics-kam regarding the mention: "Indenting list items makes it easy to distinguish keys from their list contents by eye." This is so common that multiple linters do this by default (eg; https://www.yamllint.com/).

The better readability is exactly why this convention is applied within many of the repositories I work on. This pre-commit YAML linter ends up fighting back-and-forth against Jetbrain IDE's formatter. This introduces a lot of frustration and unnecessary diffs for the developers working on our projects.

@macisamuele I think the proposed solution is good. At the very least, it would address my needs.

macisamuele commented 1 year ago

Thanks a lot @datalogics-kam and @fmigneault for helping me seeing what I was missing.

A new release of the library will be published today or tomorrow

datalogics-kam commented 1 year ago

Thanks! It works for the project that uses two space indents, for which I'm grateful. I'll put this into use there when it's released.

Unfortunately, it's not possible to use a common format that ends up like this with four space indents.

key:
    key2:
        key3:
            - item1
            - item2
        key4:
            - item 3
            - item 4

ruyaml.yaml will give you this if you specify

yaml.indent(mapping=4, sequence=6, offset=4)

...but that can't be expressed with the statement

yaml.indent(mapping=indent, sequence=indent+offset, offset=offset)

...which is why I suggested exposing all the parameters.

I'm considering my options here. I really like 4 space indents on big files for visibility, but maybe I can put better indent guides into my editor and use 2 spaces.

Anyway, thanks again for the fix!

fmigneault commented 1 year ago

Tested and validated on a big project. Works like a charm! Thanks for the quick release.