Data declaration for newtype and records

eborden commented 5 years ago

My intent with this PR is to take stock of the status of the datadecl branch and to push for an in progress merge. This work has been pending with sporadic changes for a long time. I believe that is because of the monolithic nature of the work. An all or none strategy has not worked and is counter to brittany's basic design on top of exact print (to be able to incrementally support new features). As well most incremental progress has been put towards refreshing the branch with each sporadic effort. This bit-rotting has been a time suck.

Instead I would like to see the current work (pending a few updates) merged with the goal of providing layout for newtype and record style data types.

Current Status

Supported

single records
multi-field records types
tabular alignment
basic deriving
deriving strategies
existential quantification

Blockers

retaining comments
CPP for older ghc versions

Deferred

normal types
sum types

tfausak commented 5 years ago

:eyes: I'll review this soon. I glanced over the test cases and liked what I saw!

lspitzner commented 4 years ago

I fully approve of the idea of getting this merged. I started working on supporting the older ghcs, and with this commit it compiles with both ghc-8.4 and ghc-8.6. Unfortunately, the tests on 8.4 do not pass.

One failure case is some whitespace problem that I have not investigated but that is probably relatively easy to fix.

The more annoying issue is that 8.4 does not have DerivingVia, which means that we need test-cases that only apply starting at some compiler version. Expanding the test file schema like this:

#group mytestgroup

#test mytestname
#pending myreason
func = ...

#test othertestname
func = ...

#test conditionaltestname
#starting-at-ghc 8.6
func = ...

should do the trick.

(this might point towards the question if we want to reduce the number of supported compiler versions, but I think I want this feature regardless; it seems it would be required again in the future even if we only supported two versions at a time.)

I plan to keep working on this when I find the time, in small pieces. If anyone else works on this over more than an evening, you might want to leave a note so that we don't conflict / do redundant work. This advice applies to myself, too.

eborden commented 4 years ago

@lspitzner The big blocker in my mind right now is comments. They are something that could precipitate a lot of change in this code (though maybe they won't :man_shrugging:) If you have the time I might recommend doing some research on what that might look like instead of attacking multi version support.

100% agree on committing early and often and keeping the lines of communication going so we don't have redundant work.

lspitzner commented 4 years ago

I have started on supporting comments, will push something this night.

eborden commented 4 years ago

I did a bit of fiddling with it, but didn't get anywhere. I'd love to have a brain dump from you on how brittany is currently handling comments.

eborden commented 4 years ago

@lspitzner I've updated the branch to build with all supported GHC versions. There are a number of test failures that will need to be picked off. One specific issue is literate tests that use syntax that is not available in earlier versions, like DerivingVia in 8.2.2.

  src-literatetests/Main.hs:151: 
  1) data type declarations record deriving via
       expected: Right 
                 data Foo = Bar
                   { foo  :: Baz
                   , bars :: Bizzz
                   }
                   deriving ToJSON via (SomeType)
                   deriving (ToJSON, FromJSON) via (SomeType)

        but got: Left "parsing error: parse error on input \8216via\8217"

We'll likely need a mechanism to disable certain tests depending on the GHC version.

lspitzner commented 4 years ago

great work, I'll have a look later. I think one set of errors comes down to one trivial mistake that causes duplication of offsets/comments in front of a data-def. That, together with the deriving-via fix you mentioned, I think 8.4 is clean already.

lspitzner commented 4 years ago

All green! nice. I still need to review the last few commits, and I want to test this on one or two larger codebases.

eborden commented 4 years ago

Nice work on the min-ghc constraint in tests! That will come in handy. I'll give this a run on Freckle's codebase (180 KLOC) and see if anything funky pops up.

eborden commented 4 years ago

I'm only seeing one issue. Interstitial comments are getting de-indented one level too much.

data CurrentLevel = CurrentLevel
   { clLevel :: FreckleTextLevel
-  -- ^ Either the last Snapshot or Starting/Overridden level
+-- ^ Either the last Snapshot or Starting/Overridden level
   , clStartingLevel :: FreckleTextLevel
-  -- ^ Specifically the Starting/Overridden level
+-- ^ Specifically the Starting/Overridden level
   , clOverridden :: Bool
-  -- ^ True if level has been Overridden
+-- ^ True if level has been Overridden
   }
   deriving Show

tfausak commented 4 years ago

I ran this branch over ITProTV's medium sized (72 KLOC) simple codebase and only ran into one bona fide problem:

data EnterpriseGrantsForCompanyResponse = EnterpriseGrantsForCompanyResponse Types.Company [ EnterpriseGrantResponse
]

(We have conf_layout.lconfig_cols set to 100.) I'm not exactly sure how that should be formatted, but I'd be alright with one of these:

data EnterpriseGrantsForCompanyResponse
  = EnterpriseGrantsForCompanyResponse Types.Company [EnterpriseGrantResponse]

data EnterpriseGrantsForCompanyResponse = EnterpriseGrantsForCompanyResponse
  Types.Company
  [EnterpriseGrantResponse]

data EnterpriseGrantsForCompanyResponse
  = EnterpriseGrantsForCompanyResponse
    Types.Company
    [EnterpriseGrantResponse]

eborden commented 4 years ago

This would be consistent with how brittany formats types in signatures.

data EnterpriseGrantsForCompanyResponse = EnterpriseGrantsForCompanyResponse
  Types.Company
  [EnterpriseGrantResponse]

eborden commented 4 years ago

@tfausak I added some tests and layouting for that situation.

eborden commented 4 years ago

@lspitzner I'll need your help with the comment issue.

tfausak commented 4 years ago

Awesome, thanks for the quick fix! I re-ran Brittany at 393bdb0a172bec3704d2c3ba4ff25f754ba137dc against my codebase and everything worked. I'm also seeing the comment problem, but everything builds properly after formatting.

lspitzner commented 4 years ago

My notes from testing on some non-public code:

ran into the same comment indentation problem. It is way more than "reduce by one 'layer'/tab". Need to reverse-engineer ghc-exactprint behaviour :D
for small records (think Point { x :: Int, y :: Int }) a one-line format might be nice;
for dictionary-style records (that contain functions), long type signatures are line-broken and indented not very well. This is something where I was not even sure what I had expected as ideal output;
for data-types with non-trivial existentials + constraints you get really nasty linebreaks;
this code-base puts deriving directly after the closing brace (of a record). This saves a line, but would not work well for more than one constructor. But then it is really rare that you have records + multiple constructors. It also leads to more indentation/inconsistency if you have multiple deriving (strategy) lines. Maybe I'll put this behind a config flag. (I know some of you are not a fan of more config flags. I'll ask the other people on that codebase first anyway.)

Will add test-cases/examples for the above later.

I plan to write some documentation to explain the processing of comments. I'll also look into this particular comment problem; the ghc-exactprint semantics of its DP (DeltaPosition) value are not intuitive, and this looks like another unexpected special case.

tfausak commented 4 years ago

I re-ran 09da6d4ad14b9be5ec49adc05ad77bd7cd6cffac against ITProTV's code base and everything looks great!

Most of our record data declarations were written with deriving immediately after the closing brace, but I think it makes sense to put that on its own line. I'd be alright with Brittany simply choosing to put them on their own line and not having a config for it.

eborden commented 4 years ago

I'd be alright with Brittany simply choosing to put them on their own line and not having a config for it.

The choice to put them on a new line sacrifices one vertical line for consistency in the current GHC world where DerivingStrategies is becoming more popular. I'm in favor of keeping the simplicity of always having a new line.

tfausak commented 4 years ago

Ah, I found a problem. Trying to format a data declaration like this:

data T = C {}

Returns a syntactically invalid result:

data T =

I think it should probably keep the empty braces around, but as far as I know that's the same as not taking any arguments. I'd be fine with either of these:

data T = C
data T = C {}

eborden commented 4 years ago

@tfausak I've added a test and corrected that bug.

tfausak commented 4 years ago

Awesome, thanks!

eborden commented 4 years ago

@lspitzner do you have an examples of these two?

for dictionary-style records (that contain functions), long type signatures are line-broken and indented not very well. This is something where I was not even sure what I had expected as ideal output;

for data-types with non-trivial existentials + constraints you get really nasty linebreaks;

lspitzner commented 4 years ago

I have written a doc chapter on comment (and DeltaPosition) handling: https://github.com/lspitzner/brittany/blob/master/doc/implementation/exactprinting.md

lspitzner commented 4 years ago

@eborden

data Single = Single { foo :: loooooooooooooooooooooooooooooooong -> loooooooooooooooooooooooooooooooong }

data Double = Double { foo :: loooooooooooooooooooooooooooooooong -> loooooooooooooooooooooooooooooooong, bar :: Int }

data Constrained = forall a b . (Loooooooooooooooooooooooooooooooong a, Loooooooooooooooooooooooooooooooong b) => Constrained { a :: a, b :: b}

lspitzner commented 4 years ago

"Good" layout might be something like

data Single = Single
  { foo
      :: loooooooooooooooooooooooooooooooong
      -> loooooooooooooooooooooooooooooooong
  }

data Double = Double
  { foo
      :: loooooooooooooooooooooooooooooooong
      -> loooooooooooooooooooooooooooooooong
  , bar :: Int
  }

data Constrained =
  forall a b .
  ( Loooooooooooooooooooooooooooooooong a
  , Loooooooooooooooooooooooooooooooong b
  ) => Constrained
    { a :: a, b :: b }

although I am really not sure about the last one.

tfausak commented 4 years ago

Those looks reasonable to me. I feel like forall always throws a wrench into layouting.

What about the case where two fields have the same long function type?

data Weird = Weird { foo, bar :: loooooooooooooooooooooooooooooooong -> loooooooooooooooooooooooooooooooong }

lspitzner commented 4 years ago

data Weird = Weird
  { foo, bar
      :: loooooooooooooooooooooooooooooooong
      -> loooooooooooooooooooooooooooooooong
  }

I guess? I would frankly ignore the case where the comma-separated names overflow.

lspitzner commented 4 years ago

I am working on the cases mentioned in https://github.com/lspitzner/brittany/pull/259#issuecomment-554959278.

lspitzner commented 4 years ago

The tests I came up with all pass again. Found and fixed some more bugs around cases where things don't fit in one line.

Some notes about the last commit that go beyond what we already mentioned:

I disabled the single-line layout by default, but invented a new config flag to allow single-line layout for any number of fields (instead of exactly one previously). Now that I think about this though, I am inclined to comment that flag out again until someone asks for it. I don't think that will happen - single field data (newtype is different!) is rare, and fields on multiple lines is good for readability even for small structs.
I opted to go with the "prefix" style over the end-of-line style, i.e.
```
data Foo
  = forall e
  . SomeLongConstraint e => Bar
```
over
```
data Foo =
  forall e .
  SomeLongConstraint e => Bar
```
to increase consistency with multi-constructor data decls that we will support in the future; for those I think we'll use
```
data Either a b
  = Left a
  | Right b
```

For the RHS constraint I did however not put "=>" at the start of the line. Makes the indentation amount odd. A somewhat full example is

data MyRecord
  = forall a b
  . ( Loooooooooooooooooooooooooooooooong a
    , Loooooooooooooooooooooooooooooooong b
    ) =>
    MyConstructor
      { foo, foo2
          :: loooooooooooooooooooooooooooooooong
          -> loooooooooooooooooooooooooooooooong
      , bar  :: a
      , bazz :: b
      }
  deriving Show

oh, and now I notice that if your derivings can overflow. That's annoying, although it probably is rare for data. For newtype there definitely are cases with more than one line of generalized-newtype-deriving instances. E.g.

newtype UserId = UserId { getUserId :: Int }
  deriving (Show, Read, Eq, Ord, Enum, Integral, Generic, Typeable, Num, ToJSON, FromJSON, ToField, ToRow)

For newtypes I really would like some fancy unique+paragraph-fill solution, so you get

newtype UserId = UserId { getUserId :: Int }
  deriving (Show, Read, Eq, Ord, Enum, Integral, Generic, Typeable)
  deriving (Num, ToJSON, FromJSON, ToField, ToRow)

Let's leave this for the newtype layouting implementation though.

eborden commented 4 years ago

@lspitzner I have a separate branch based off this to support sum types. When this is ready I can merge the differences in. Are we ready to give this a :+1: and merge?

lspitzner commented 4 years ago

I have tested it on what I have available and this looks good to merge. @tfausak could you perhaps check this one more time on your larger codebase(s) please?

eborden commented 4 years ago

@lspitzner Any conclusions on merging this? This PR is already bit-rotting and conflicts are sneaking in. I'd love to clean up these merge conflicts and merge this current batch of work. Improvements can be deferred to further PRs.

eborden commented 4 years ago

:point_up: pushed up conflict fixes. Please git pull -r if you have additional commits to add.

tfausak commented 4 years ago

@tfausak could you perhaps check this one more time on your larger codebase(s) please?

Sorry, I didn't see this until now. I re-ran the latest commit (a1282c3ac670b61e6bbcd50cf7a12612b385b2e5) against the ITProTV code base and everything looked great! :+1:

lspitzner commented 4 years ago

@tfausak great, thanks.

@eborden thanks for the rebase. I have found one glitch in a special case that I have not had time to look at in the last few days. Will have a look now. If it has a simple fix I'd like to include it, otherwise I'll merge this and release. Will hopefully get back to you in the next hour or so.

eborden commented 4 years ago

@lspitzner I'm not necessarily pushing for a release. I'm just pushing for merging to master. If there are further improvements to be made before a release then we should defer releasing until they are also merged to master.

lspitzner commented 4 years ago

But once it is on master, we cannot make releases without this feature. And I'd like to make a new release with the regression fixes that are on master asap.

I'll just make a patch release 0.12.1.1 and merge this immediately after. I am not happy with my latest commit (had to add another Bool to the BriDoc datatype..) but it passes all tests now. I am pretty certain that this only affects rather special cases.

lspitzner commented 4 years ago

Thanks again @eborden . I have just released 0.12.1.1 without this feature, but we will definitely include it in the next release. I really think this is ready for publication even when it does not cover mulitple constructors yet; I just wanted to get the bugfixes out without having to organize the announcement that we should have with the new feature release.

eborden commented 4 years ago

Thanks @lspitzner. My current plan is working on sums next, which I'll follow with GADTs. Let me know if this conflicts with any of your initiatives.

lspitzner / brittany