Open Profpatsch opened 7 years ago
You'll probably need to add that support by using an annotated AST (there's a file for supplying annotation in the repository).
So the parser just skips comments right now?
Yes, I believe so. I'm not sure if the annotated parser also does, though.
I'm out of time these days, but can Ichip-in or help otherwise to make this happen?
This would be ground work for declarative documentation in Nix, to generate docs from the source code.
Right now, comments are skipped in the lexer: https://hackage.haskell.org/package/hnix-0.3.4/docs/src/Nix-Parser-Library.html#commentStyle, with CommentStyle
.
Comments are tricky:
# function comment
# Something something
{ args /*here’s another comment*/
, moreArgs # but where did it come from
# and more important:
, evenMoreArgs
# where do these comments belong?
}:
/* should I be
explicitely kept as multiline comment?
*/
# if so, is it necessary to keep the info that this was not a ML-comment?
# Maybe pretty-printing should handle that by looking at the line size?
/* also
* what about
* comments formatted like this
*
* Should lines be preserved?
* What about the strange stars in front of each line?
And if the comment suddenly stops using stars?
What then?
*/
# not all that easy on first thought?
{
# docstring?
attrWithComment = "foo"; # and line comment
}
Comments seem to be completely removed in the lexer phase, stuff like
{ foo /*uh oh*/ = /*oh uh*/ "23" /*test*/; }
/*comment*/
parses into { foo = "23"; }
like a charm.
That might make it a tad tricky to integrate them into the parsed AST in a sensible way?
I thought about it a bit more.
It doesn’t make very much sense to add comments to NExprF
because they are on the lexer level and should probably be added to the annotations:
data Comment = Comment
{ commentType :: CommentType
, commentContent :: Text
, commentSpan :: SrcSpan }
data CommentType = SingleLine | MultiLine
data CommentSpan = CommentSpan
{ srcSpan :: SrcSpan
, comments :: Comment }
type NExprCommentF = AnnF CommentSpan NExprF
Comments are parsed out into a list containing their contents & SrcSpan
s. Then a second pass can be used to convert this annotation to docstrings (by some more elaborate rules, like looking at the placement of comments and for instance combining multiple single line comments).
Right now the parser always creates a NExprLocF
and then strips it away if one only uses nixExpr :: Parser NExpr
. If we extend that do the same with comments, I’m not sure if the performance/memory footprint of nixExpr
would suffer? Probably depends if the laziness of stripAnnotation <$> nixExprLoc
reaches the many (annotateLocation (void $ symbol name))
-part in nixExprLoc
? I’m not sure.
@Profpatsch I like the idea of adding the comments to the annotations. I wouldn't worry about the speed impact just yet.
@Profpatsch, I'm slightly worried when you mention docstrings -- since they presumably can't be attached at arbitrary points, and also potentially due to the transformations you mention..
Do you envision being able to have an identity parser/pretty-printer with that scheme?
Do you envision being able to have an identity parser/pretty-printer with that scheme?
I don’t really have a vision, just incoherent thoughts. :)
hnix already keeps source spans by default:
parseNixFileLoc :: MonadIO m => FilePath -> m (Result NExprLoc)
type NExprLoc = Fix NExprLocF
type NExprLocF = AnnF SrcSpan NExprF
parseNixFile
is just a parseNixFileLoc
which throws away its SrcSpans
(probably doesn’t even instantiate them into memory given enough laziness).
So I don’t see how something like parseNixFileLocComments :: MonadIO m => FilePath -> m (Result (Fix (AnnF CommentSpan (AnnF SourceSpan NExprF))))
changes the status quo.
Hmm, so you're thinking of tying each comment to an AST item, if I understand correctly the type signature? I'm not sure how you'd be able to do it for all the examples you raised in https://github.com/jwiegley/hnix/issues/57#issuecomment-311553816: maybe full-line comments attach to the token after and part-of-line comments attach to the token before?
(for the context, I'm currently trying to implement a tool for indenting nix code that basically parses then pretty-prints it, and comments are likely the last missing piece to integrate, so I'm looking this quite closely :))
@Ekleog I suspect you’d have to just decide on what lexer token you assign them to.
The definition of CommentSpan
in https://github.com/jwiegley/hnix/issues/57#issuecomment-311725647 is missing a list I think:
data CommentSpan = CommentSpan
{ srcSpan :: SrcSpan
, comments :: [Comment] }
because of stuff like { foo /*uh oh*/ = /*oh uh*/ "23" /*test*/; }
Hmm, but then the issue is to decide to which lexer token to assign the comments?
For a few comments it's easy, like:
nothere
here // my comment
nothere
nothere
here /* my comment */
nothere
nothere
// my comment
here
here
// my comment
nothere
But for others even as a human without additional context I couldn't guess to which token the following are to be assigned?
maybehere /* my comment */ orhere
maybehere
// my comment
orhere
Then I guess it's just good enough to autocratically pick one and say “this is it” :)
It's been a long time since I looked at the hnix
source code, so I'm not sure how much rearchitecting this would take, but what do you guys think about this way of integrating comments:
This allows the comment information to be fully preserved and accessible without making the AST cumbersome to use.
In regards to the announced sprint tomorrow at nixcon it was said that adding parsing for comments is a wanted feature. I just wanted to ask that if this feature be added to https://github.com/NixOS/nix so that other language projects can benefit as well! I myself have been working on a python parser and the one thing that is dropped is comments due to the nix ast parser. https://github.com/Mic92/pythonix/pull/2
I've had a look at the AST and it seems that for the purpose of formatting nix code reliably, we will need to have access to token spans. This is because the AST simply can not (and should not) represent the comments and/or whitespace in every possible place. Example:
inherit /*a*/ ( /*b*/ foo) bar;
vs Binding
constructor:
Inherit !(Maybe r) ![NKeyName r] !SourcePos
Of course preserving all comments is not a requirement for retaining comments, but both a formatter and some kind of comment processing seem to benefit from having a token stream, as @taktoa points out.
I'll see how far I can still get with tokenizing today at the NixCon 2018 hackday.
As I see people pointing to formatting nix code reliably, I'll point to https://github.com/Ekleog/nix-bikeshed/ so that people don't need to re-do the same thing twice :)
It's more or less working, mostly missing comments at the time being. (actually, running it is how I found out https://github.com/haskell-nix/hnix/issues/152: parsing, generating the file and re-parsing didn't give the same answer, and for my generation)
So if some support for comments could be added, it'd be great! :)
@Ekleog how does it compare to hnix formatter?
If you're speaking of Nix.Pretty
, then nix-bikeshed
handles wrapping
overlong lines correctly, and doesn't hesitate to switch between a
''foo''
string and a "foo"
string (depending on line length), for
the two first changes I can see off-the-top-of-my-head.
If there's another hnix formatter, then I didn't find about it when I
started nix-bikeshed
a few months ago (before putting it on hold
waiting for this issue to be fixed)
Sorry guys, I have given up on lexing. My findings:
nix
lexer has to support string interpolation. Chances that this can be ported to alex are low. Alex might have improved performance compared to parsing with combinators twice (lex-parse + parse)[/* empty because important */]
we'll need to use the dual of Ann
+1
See https://github.com/NorfairKing/nixfmt for an attempt at a normalising formatter. The only thing missing there are comments, as mentioned in the README.
@NorfairKing @Ekleog now we have three formatters :) It would be good to see what's the difference and just get hnix
formatter to do the correct thing (or provide modes of operation).
@domenkozar https://github.com/NorfairKing/nixfmt
just provides a CLI wrapper around the hnix parser and pretty-printer, nothing more. If the hnix
parser and pretty printer are fixed to deal with comments, nixfmt
will work as intended.
You can format by invoking hnix <filepath>
:)
@domenkozar I think Nix.Pretty
doesn't try to format nix code but to pretty-print nix expressions, which is IMO a vastly different objective. For instance, pretty-printing YAML may be just setting it to always-indent like:
foo:
bar:
- a
- b
- c
baz:
- some long stuff
- other long stuff
- let us say this overflows max line length
While formatting it would be like:
foo:
bar: [a, b, c]
baz:
- some long stuff
- other long stuff
- let us say this overflows max line length
As such, I think the objective of the two projects (Nix.Pretty
and nix-bikeshed
) don't match, and thus they aren't overlapping :)
As for having hnix
do the correct thing, it's another problem, and it could very well call into a formatting library.
The difference would only be setting how long can a line be, so setting it to 1 char would "pretty-print" it. So it could easily be under the same codebase, I think.
Well, we totally agree that a formatter can almost do pretty-printing. However I think is that a pretty-printer can't do formatting at all :)
(also, the difference also includes eg. whether ''foo''
can be turned to "foo"
which changes the AST and thus can't be included in a pretty-printer, and similar stuff)
Just my 2 cents: I don't much care about formatting vs pretty-printing. I just want a single normalized form of a given AST that is vaguely good for humans to edit. Something like hindent
or gofmt
. If that's done with a formatter instead of a pretty-printer: great, but I'm not (personally) interested in bikeshedding what the format would be.
@NorfairKing same here. I worked on formatter to do "something sane", but I don't prefer the discussion on the style as much as I value consistency.
The current formatter is used in https://github.com/domenkozar/hnix-lsp and I'd love if hnix would preserve even most basic comments (after lines and in-between).
If you're interested in something vaguely good for humans to edit, you'll likely be OK with a pretty-printer.
If you want a tool to add to your CI to ensure your code is properly formatted, you'll likely want a formatter.
I think this sums up the difference I see, and the reason why I think
Nix.Pretty
, by its name, doesn't try to handle the problem
nix-bikeshed
is trying to handle. But maybe I'm wrong here, I'll let
@jwiegley contradict me :)
Also, a reason why I don't think nix-bikeshed
should live under hnix
is due to its name: it'll likely see a lot of churn because no one will
agree on the way things should be formatted, if we want to enforce
formatting across nixpkgs, and so keeping the issue tracker separate
sounded like a net win to me.
BTW, the reason why I didn't announce nix-bikeshed
until now is
because it doesn't support comments yet, and I believe if it doesn't
support comments then it's not a formatter -- while it's already a
pretty-printer -- and I thus didn't want to ping people on a half-baked
thing that wouldn't be actually usable :)
One advantage of pretty printing over formatting: smaller diffs. With automated formatting, you never know when it's going to decide to completely rewrite a large block of code because some limit has been reached.
Did anything ever come of this?
I think that even a simple solution which recognizes comments only in specific places would be quite useful!
I've written a pretty basic comment annotator here: https://github.com/expipiplus1/update-nix-fetchgit/blob/2e9df344c40e289920e94c82dbce2d87ae54ecc5/src/Nix/Comments.hs
It simply attaches comments to the expression which immediately precedes them (so multiple expressions may own the same comment). It also only handles single line comments.
Requires a fix for https://github.com/haskell-nix/hnix/issues/743 (https://github.com/haskell-nix/hnix/pull/744)
Would mention the core point of current comment processing:
So, indeed (as points the topic), currently, HNix parser processes comments as whitespaces.
As far as I can see, there is no way to represent comments in the AST at the moment.
Since I’m generating files that should be human-readable as well, it would be nice to have them (only line comments as far as I’m concerned, but inline/block comments are probably a good idea as well).