nushell / nufmt

MIT License
64 stars 8 forks source link

Make the formatter keep comments #17

Closed AucaCoyan closed 1 year ago

AucaCoyan commented 1 year ago

In the current status, nufmt strips comments, wherever they are from the code. Because of parsing with .parse() I would be cool if once the vec<u8> is loaded, split the comments and format "sections of code" inbetween comments, so we can then collect all the comments-code together afterwards. Also, there is a test skipped with this very feature: ignore_comments

fdncred commented 1 year ago

I wonder if you can infer where comments are by using offsets?

For instance, if you have this test.nu script

# function comment
def fun1 [text] {
    echo "fun1: $text"
}

# this is a 
# multi-line comment
# before a function
def fun2 [text]] {
    echo "fun2: $text"
}

Then this command reveals where the first part of code is, it starts at 19.

❯ nu --ide-ast test.nu | from json | table -e
╭────┬───────────────┬────────────────────┬─────────────────┬──────╮
│  # │    content    │       shape        │      span       │ type │
├────┼───────────────┼────────────────────┼─────────────────┼──────┤
│  0 │ def           │ shape_internalcall │ ╭───────┬────╮  │ ast  │
│    │               │                    │ │ end   │ 22 │  │      │
│    │               │                    │ │ start │ 19 │  │      │
│    │               │                    │ ╰───────┴────╯  │      │

So, that means 0 - 19 may be comments.

❯ open test.nu | str substring 0..19
# function comment

Then we can see this section

│  6 │               │ shape_closure      │ ╭───────┬────╮  │ ast  │
│    │ }             │                    │ │ end   │ 61 │  │      │
│    │               │                    │ │ start │ 59 │  │      │
│    │               │                    │ ╰───────┴────╯  │      │
│  7 │ def           │ shape_internalcall │ ╭───────┬─────╮ │ ast  │
│    │               │                    │ │ end   │ 120 │ │      │
│    │               │                    │ │ start │ 117 │ │      │
│    │               │                    │ ╰───────┴─────╯ │      │

Then we could do this

❯ open test.nu | str substring 61..117

# this is a
# multi-line comment
# before a function

Kind of a long heuristic way to go, but it may be work-able?

AucaCoyan commented 1 year ago

That is a different (and I believe better) algorithm. I'll try it! thanks!

The algorithm I first thought is: the CLI sends Vec<u8> the file comments.rs use lex() to read every TokenContents and checks if there are these 2 tokens concatenated: TokenContents::Comment and TokenContents::Eol (end of line, the last character which usually is \n)

After it found them, split in between. So you will have:

Split the sections in yes-code and no-code, format them, and collect them for the output

(a much larger process than your logic)

fdncred commented 1 year ago

I saw your code, if it works I'm fine with yours too. I was just suggesting an alternative method. I think it would be much better if we could keep track of all comments during parsing so we're not guessing.