haskell-nix / hnix

A Haskell re-implementation of the Nix expression language
https://hackage.haskell.org/package/hnix
BSD 3-Clause "New" or "Revised" License
754 stars 115 forks source link

Comment constructor #57

Open Profpatsch opened 7 years ago

Profpatsch commented 7 years ago

As far as I can see, there is no way to represent comments in the AST at the moment.

Since I’m generating files that should be human-readable as well, it would be nice to have them (only line comments as far as I’m concerned, but inline/block comments are probably a good idea as well).

jwiegley commented 7 years ago

You'll probably need to add that support by using an annotated AST (there's a file for supplying annotation in the repository).

Profpatsch commented 7 years ago

So the parser just skips comments right now?

jwiegley commented 7 years ago

Yes, I believe so. I'm not sure if the annotated parser also does, though.

domenkozar commented 7 years ago

I'm out of time these days, but can Ichip-in or help otherwise to make this happen?

This would be ground work for declarative documentation in Nix, to generate docs from the source code.

Profpatsch commented 7 years ago

Right now, comments are skipped in the lexer: https://hackage.haskell.org/package/hnix-0.3.4/docs/src/Nix-Parser-Library.html#commentStyle, with CommentStyle.

Comments are tricky:

# function comment
# Something something
{ args /*here’s another comment*/
, moreArgs # but where did it come from
# and more important:
, evenMoreArgs
# where do these comments belong?
}:

/* should I be
   explicitely kept as multiline comment?
*/

# if so, is it necessary to keep the info that this was not a ML-comment?
# Maybe pretty-printing should handle that by looking at the line size?

/* also
 * what about
 * comments formatted like this
 *
 * Should lines be preserved?
 * What about the strange stars in front of each line?

And if the comment suddenly stops using stars?
What then?
*/

# not all that easy on first thought?

{
  # docstring?
  attrWithComment = "foo"; # and line comment

}

Comments seem to be completely removed in the lexer phase, stuff like

{ foo /*uh oh*/ = /*oh uh*/ "23" /*test*/; }
/*comment*/

parses into { foo = "23"; } like a charm.

That might make it a tad tricky to integrate them into the parsed AST in a sensible way?

Profpatsch commented 7 years ago

I thought about it a bit more. It doesn’t make very much sense to add comments to NExprF because they are on the lexer level and should probably be added to the annotations:

data Comment = Comment
  { commentType :: CommentType
  , commentContent :: Text
  , commentSpan :: SrcSpan }

data CommentType = SingleLine | MultiLine

data CommentSpan = CommentSpan
  { srcSpan :: SrcSpan
  , comments :: Comment }

type NExprCommentF = AnnF CommentSpan NExprF

Comments are parsed out into a list containing their contents & SrcSpans. Then a second pass can be used to convert this annotation to docstrings (by some more elaborate rules, like looking at the placement of comments and for instance combining multiple single line comments).

Right now the parser always creates a NExprLocF and then strips it away if one only uses nixExpr :: Parser NExpr. If we extend that do the same with comments, I’m not sure if the performance/memory footprint of nixExpr would suffer? Probably depends if the laziness of stripAnnotation <$> nixExprLoc reaches the many (annotateLocation (void $ symbol name))-part in nixExprLoc? I’m not sure.

jwiegley commented 6 years ago

@Profpatsch I like the idea of adding the comments to the annotations. I wouldn't worry about the speed impact just yet.

deepfire commented 6 years ago

@Profpatsch, I'm slightly worried when you mention docstrings -- since they presumably can't be attached at arbitrary points, and also potentially due to the transformations you mention..

Do you envision being able to have an identity parser/pretty-printer with that scheme?

Profpatsch commented 6 years ago

Do you envision being able to have an identity parser/pretty-printer with that scheme?

I don’t really have a vision, just incoherent thoughts. :)

hnix already keeps source spans by default:

parseNixFileLoc :: MonadIO m => FilePath -> m (Result NExprLoc)
type NExprLoc = Fix NExprLocF
type NExprLocF = AnnF SrcSpan NExprF

parseNixFile is just a parseNixFileLoc which throws away its SrcSpans (probably doesn’t even instantiate them into memory given enough laziness).

So I don’t see how something like parseNixFileLocComments :: MonadIO m => FilePath -> m (Result (Fix (AnnF CommentSpan (AnnF SourceSpan NExprF)))) changes the status quo.

Ekleog commented 6 years ago

Hmm, so you're thinking of tying each comment to an AST item, if I understand correctly the type signature? I'm not sure how you'd be able to do it for all the examples you raised in https://github.com/jwiegley/hnix/issues/57#issuecomment-311553816: maybe full-line comments attach to the token after and part-of-line comments attach to the token before?

(for the context, I'm currently trying to implement a tool for indenting nix code that basically parses then pretty-prints it, and comments are likely the last missing piece to integrate, so I'm looking this quite closely :))

Profpatsch commented 6 years ago

@Ekleog I suspect you’d have to just decide on what lexer token you assign them to.

The definition of CommentSpan in https://github.com/jwiegley/hnix/issues/57#issuecomment-311725647 is missing a list I think:

data CommentSpan = CommentSpan
  { srcSpan :: SrcSpan
  , comments :: [Comment] }

because of stuff like { foo /*uh oh*/ = /*oh uh*/ "23" /*test*/; }

Ekleog commented 6 years ago

Hmm, but then the issue is to decide to which lexer token to assign the comments?

For a few comments it's easy, like:

nothere
here // my comment
nothere
nothere
here /* my comment */
nothere
nothere

// my comment
here
here
// my comment

nothere

But for others even as a human without additional context I couldn't guess to which token the following are to be assigned?

maybehere /* my comment */ orhere
maybehere

// my comment

orhere

Then I guess it's just good enough to autocratically pick one and say “this is it” :)

taktoa commented 6 years ago

It's been a long time since I looked at the hnix source code, so I'm not sure how much rearchitecting this would take, but what do you guys think about this way of integrating comments:

  1. Lex the source file into a stream of tokens, each of which has a corresponding source span
  2. Parse this into an AST, but instead of having source spans in the AST, you have offsets into the stream of tokens (or perhaps you have both).
  3. The stream of tokens contains comments, but they don't result in any AST nodes.

This allows the comment information to be fully preserved and accessible without making the AST cumbersome to use.

costrouc commented 5 years ago

In regards to the announced sprint tomorrow at nixcon it was said that adding parsing for comments is a wanted feature. I just wanted to ask that if this feature be added to https://github.com/NixOS/nix so that other language projects can benefit as well! I myself have been working on a python parser and the one thing that is dropped is comments due to the nix ast parser. https://github.com/Mic92/pythonix/pull/2

roberth commented 5 years ago

I've had a look at the AST and it seems that for the purpose of formatting nix code reliably, we will need to have access to token spans. This is because the AST simply can not (and should not) represent the comments and/or whitespace in every possible place. Example:

inherit /*a*/ ( /*b*/ foo) bar;

vs Binding constructor:

Inherit !(Maybe r) ![NKeyName r] !SourcePos

Of course preserving all comments is not a requirement for retaining comments, but both a formatter and some kind of comment processing seem to benefit from having a token stream, as @taktoa points out.

I'll see how far I can still get with tokenizing today at the NixCon 2018 hackday.

Ekleog commented 5 years ago

As I see people pointing to formatting nix code reliably, I'll point to https://github.com/Ekleog/nix-bikeshed/ so that people don't need to re-do the same thing twice :)

It's more or less working, mostly missing comments at the time being. (actually, running it is how I found out https://github.com/haskell-nix/hnix/issues/152: parsing, generating the file and re-parsing didn't give the same answer, and for my generation)

So if some support for comments could be added, it'd be great! :)

domenkozar commented 5 years ago

@Ekleog how does it compare to hnix formatter?

Ekleog commented 5 years ago

If you're speaking of Nix.Pretty, then nix-bikeshed handles wrapping overlong lines correctly, and doesn't hesitate to switch between a ''foo'' string and a "foo" string (depending on line length), for the two first changes I can see off-the-top-of-my-head.

If there's another hnix formatter, then I didn't find about it when I started nix-bikeshed a few months ago (before putting it on hold waiting for this issue to be fixed)

roberth commented 5 years ago

Sorry guys, I have given up on lexing. My findings:

NorfairKing commented 5 years ago

+1

See https://github.com/NorfairKing/nixfmt for an attempt at a normalising formatter. The only thing missing there are comments, as mentioned in the README.

domenkozar commented 5 years ago

@NorfairKing @Ekleog now we have three formatters :) It would be good to see what's the difference and just get hnix formatter to do the correct thing (or provide modes of operation).

NorfairKing commented 5 years ago

@domenkozar https://github.com/NorfairKing/nixfmt just provides a CLI wrapper around the hnix parser and pretty-printer, nothing more. If the hnix parser and pretty printer are fixed to deal with comments, nixfmt will work as intended.

domenkozar commented 5 years ago

You can format by invoking hnix <filepath> :)

Ekleog commented 5 years ago

@domenkozar I think Nix.Pretty doesn't try to format nix code but to pretty-print nix expressions, which is IMO a vastly different objective. For instance, pretty-printing YAML may be just setting it to always-indent like:

foo:
 bar:
  - a
  - b
  - c
 baz:
  - some long stuff
  - other long stuff
  - let us say this overflows max line length

While formatting it would be like:

foo:
 bar: [a, b, c]
 baz:
  - some long stuff
  - other long stuff
  - let us say this overflows max line length

As such, I think the objective of the two projects (Nix.Pretty and nix-bikeshed) don't match, and thus they aren't overlapping :)

As for having hnix do the correct thing, it's another problem, and it could very well call into a formatting library.

domenkozar commented 5 years ago

The difference would only be setting how long can a line be, so setting it to 1 char would "pretty-print" it. So it could easily be under the same codebase, I think.

Ekleog commented 5 years ago

Well, we totally agree that a formatter can almost do pretty-printing. However I think is that a pretty-printer can't do formatting at all :)

(also, the difference also includes eg. whether ''foo'' can be turned to "foo" which changes the AST and thus can't be included in a pretty-printer, and similar stuff)

NorfairKing commented 5 years ago

Just my 2 cents: I don't much care about formatting vs pretty-printing. I just want a single normalized form of a given AST that is vaguely good for humans to edit. Something like hindent or gofmt. If that's done with a formatter instead of a pretty-printer: great, but I'm not (personally) interested in bikeshedding what the format would be.

domenkozar commented 5 years ago

@NorfairKing same here. I worked on formatter to do "something sane", but I don't prefer the discussion on the style as much as I value consistency.

The current formatter is used in https://github.com/domenkozar/hnix-lsp and I'd love if hnix would preserve even most basic comments (after lines and in-between).

Ekleog commented 5 years ago

If you're interested in something vaguely good for humans to edit, you'll likely be OK with a pretty-printer.

If you want a tool to add to your CI to ensure your code is properly formatted, you'll likely want a formatter.

I think this sums up the difference I see, and the reason why I think Nix.Pretty, by its name, doesn't try to handle the problem nix-bikeshed is trying to handle. But maybe I'm wrong here, I'll let @jwiegley contradict me :)

Also, a reason why I don't think nix-bikeshed should live under hnix is due to its name: it'll likely see a lot of churn because no one will agree on the way things should be formatted, if we want to enforce formatting across nixpkgs, and so keeping the issue tracker separate sounded like a net win to me.

BTW, the reason why I didn't announce nix-bikeshed until now is because it doesn't support comments yet, and I believe if it doesn't support comments then it's not a formatter -- while it's already a pretty-printer -- and I thus didn't want to ping people on a half-baked thing that wouldn't be actually usable :)

jwiegley commented 5 years ago

One advantage of pretty printing over formatting: smaller diffs. With automated formatting, you never know when it's going to decide to completely rewrite a large block of code because some limit has been reached.

expipiplus1 commented 3 years ago

Did anything ever come of this?

I think that even a simple solution which recognizes comments only in specific places would be quite useful!

expipiplus1 commented 3 years ago

I've written a pretty basic comment annotator here: https://github.com/expipiplus1/update-nix-fetchgit/blob/2e9df344c40e289920e94c82dbce2d87ae54ecc5/src/Nix/Comments.hs

It simply attaches comments to the expression which immediately precedes them (so multiple expressions may own the same comment). It also only handles single line comments.

Requires a fix for https://github.com/haskell-nix/hnix/issues/743 (https://github.com/haskell-nix/hnix/pull/744)

Anton-Latukha commented 3 years ago

Would mention the core point of current comment processing:

https://github.com/haskell-nix/hnix/blob/57a55fd4874c3c8d07836367dce678d113f2f823/src/Nix/Parser.hs#L498-L506

So, indeed (as points the topic), currently, HNix parser processes comments as whitespaces.