haskell / cabal

Official upstream development repository for Cabal and cabal-install
https://haskell.org/cabal
Other
1.62k stars 691 forks source link

RFC: Automated formatting of the codebase #8936

Closed Kleidukos closed 1 year ago

Kleidukos commented 1 year ago

During this week's cabal maintainers call, the subject of contributors churn was raised, and an example that I noticed in some PRs was indentation. This is of course not a clear cut subject, because nobody wants to read weirdly-indented code but also we hate to manually indent stuff (especially on large patches), and personally I groan each time I get an entirely justified comment related to formatting.

I'd like to submit to the audience that we adopt a single formatting style of the codebase.

The tool is Fourmolu, a fork of Ormolu which integrates configuration flags, as opposed to the One True Style promoted by Ormolu. Formolu comes with a GitHub Action, an interactive configuration visualiser.

One main reason that I'm suggesting Fourmolu is that we own the style. One very real criticism of late introduction of a formatter in a codebase is that we end up with a big commit that is just "format lol". Ormolu itself promotes a single style but this style evolves, and Fourmolu gives us the ability to disable the new additions to Ormolu. For me personally I would say it would remove quite a lot of frustration because I only want 1 reformatting commit and that's it.

In practice we can end up with conflicts between Ormolu and HLint: #1003 (Parentheses between some single class constraints conflict with hlint rules), which is why I'm in favour of actively owning our style, yet be able to automatically enforce it rather than spend time and energy doing it manually.

Moreover, Fourmolu options reduce diffs, which is a net benefit when reviewing PRs.

Here what it could look like:

diff --git a/Lol.hs b/Lol.hs
index 83772f1..dec18b5 100644
--- a/Lol.hs
+++ b/Lol.hs
@@ -1,16 +1,25 @@
-build    :: PackageDescription  -- ^ Mostly information from the .cabal file
-         -> LocalBuildInfo      -- ^ Configuration information
-         -> BuildFlags          -- ^ Flags that the user passed to build
-         -> [ PPSuffixHandler ] -- ^ preprocessors to run before compiling
-         -> IO ()
+build
+  :: PackageDescription
+  -- ^ Mostly information from the .cabal file
+  -> LocalBuildInfo
+  -- ^ Configuration information
+  -> BuildFlags
+  -- ^ Flags that the user passed to build
+  -> [PPSuffixHandler]
+  -- ^ preprocessors to run before compiling
+  -> IO ()
 build pkg_descr lbi flags suffixes = do
   checkBuildProblems verbosity (compiler lbi) flags
   targets <- readTargetInfos verbosity pkg_descr lbi (buildArgs flags)
   let componentsToBuild = neededTargetsInBuildOrder' pkg_descr lbi (map nodeKey targets)
-  info verbosity $ "Component build order: "
-                ++ intercalate ", "
-                    (map (showComponentName . componentLocalName . targetCLBI)
-                        componentsToBuild)
+  info verbosity $
+    "Component build order: "
+      ++ intercalate
+        ", "
+        ( map
+            (showComponentName . componentLocalName . targetCLBI)
+            componentsToBuild
+        )
   when (null targets) $
     -- Only bother with this message if we're building the whole package
     setupMessage verbosity "Building" (packageId pkg_descr)
@@ -24,26 +33,28 @@ build pkg_descr lbi flags suffixes = do
     let comp = targetComponent target
         clbi = targetCLBI target
     componentInitialBuildSteps distPref pkg_descr lbi clbi verbosity
-    let bi     = componentBuildInfo comp
+    let bi = componentBuildInfo comp
         progs' = addInternalBuildTools pkg_descr lbi bi (withPrograms lbi)
-        lbi'   = lbi {
-                   withPrograms  = progs',
-                   withPackageDB = withPackageDB lbi ++ [internalPackageDB],
-                   installedPkgs = index
-                 }
+        lbi' =
+          lbi
+            { withPrograms = progs'
+            , withPackageDB = withPackageDB lbi ++ [internalPackageDB]
+            , installedPkgs = index
+            }
     mb_ipi <- buildComponent verbosity (buildNumJobs flags) pkg_descr
-    par_strat <- toFlag <$> case buildUseSemaphore flags of
-                  Flag sem_name -> case buildNumJobs flags of
-                                    Flag {} -> do
-                                      warn verbosity $ "Ignoring -j due to -semaphore"
-                                      return $ UseSem sem_name
-                                    NoFlag -> return $ UseSem sem_name
-                  NoFlag -> return $ case buildNumJobs flags of
-                              Flag n -> NumJobs n
-                              NoFlag -> Serial 
+    par_strat <-
+      toFlag <$> case buildUseSemaphore flags of
+        Flag sem_name -> case buildNumJobs flags of
+          Flag{} -> do
+            warn verbosity $ "Ignoring -j due to -semaphore"
+            return $ UseSem sem_name
+          NoFlag -> return $ UseSem sem_name
+        NoFlag -> return $ case buildNumJobs flags of
+          Flag n -> NumJobs n
+          NoFlag -> Serial
     return (maybe index (Index.insert `flip` index) mb_ipi)
-  
+
   return ()
- where
-  distPref  = fromFlag (buildDistPref flags)
-  verbosity = fromFlag (buildVerbosity flags)
+  where
+    distPref = fromFlag (buildDistPref flags)
+    verbosity = fromFlag (buildVerbosity flags)

Personally I believe that the combo of normalisation, automation and ownership of the coding style is something that could remove a certain burden from our contributors and reviewers.

PS: Of course I would provide the initial CI setup and reformatting PR.

ffaf1 commented 1 year ago
-  info verbosity $ "Component build order: "
-                ++ intercalate ", "
-                    (map (showComponentName . componentLocalName . targetCLBI)
-                        componentsToBuild)
+  info verbosity $
+    "Component build order: "
+      ++ intercalate
+        ", "
+        ( map
+            (showComponentName . componentLocalName . targetCLBI)
+            componentsToBuild
+        )

Mhh on this one I am not sure the second version is easier to read than the first. I guess this is one (in theory, not proposing) of the things we could turn off via a flag?

Kleidukos commented 1 year ago

@ffaf1 feel free to toy with the live editor, but I believe that the newline after the dollar might be one of the things that is inherited from Ormolu without a possibility to change it. Of course the authors of Fourmolu are always welcoming of patches to configure things. That being said there is an explicit column limit that we can set, or we can set it to none.

Kleidukos commented 1 year ago

@ffaf1 but to get back on the topic, what do you think of the proposal? At least in principle :P

emilypi commented 1 year ago

Tbqh I don't care about what formatting is done, as long as a choice is made, enforced in CI, and stuck with for a long time period. Cabal is a mix of styles spanning 20 years. It is crufty, old, and hard to understand alot of the time because many contributions have not been vetted for stylistic concerns along with the content of their code. A formatter solves this problem to a degree: you know what you're getting, and what you have to do, and if you didn't do it, I would hope CI tells you this and forces you to do it.

ffaf1 commented 1 year ago

I like it! I don't particularly like automated formatting but I like easing pain of contributors (especially new ones) much much more, and CI picking up that will be much friendlier.

btw my gripe was not on $ but on the newline after intercalate and the French spacing (( map…) around parentheses (edit: lol, I confused that with «» quotation marks! French typographic rules do not have a space inside parentheses!).

But as I said, automated formatting would be a very good addition.

Kleidukos commented 1 year ago

@ffaf1 ah yes I see. :) (also, stray bullet for the French lol, we actually don't put spaces inside parentheses unless it is to avoid corrupting a URL

Thank you for your approval :bow:

ulysses4ever commented 1 year ago

Support. Most importantly, because it avoids unnecessary interactions during the code review (as noted in the OP).

as for the style, I don’t have much preferences, and I’m not a big user of formatters myself. From what I heard, Formolu might be just the right choice for most code bases, so I’m happy that it’s proposed in particular.

I’d salut to someone who sets up CI for it and does the initial formatting. Formatting Options are not critical: we can always reconsider them.

Kleidukos commented 1 year ago

@ulysses4ever aye I would do it. :)

philderbeast commented 1 year ago

I too support this and strongly prefer fourmolu (configured away from the ormolu look).

Mikolaj commented 1 year ago

Let's do it!

ulysses4ever commented 1 year ago

@andreasabel introduced fix-whitespace in this project, so I'd like to hear his perspective on automated formatters (which seem related) if possible. E.g. does the "easing contributors' live" sound convincing to you, Andreas? Agda source doesn't seem to have a formatter?

Ironically, the issue about whitespace checking had another sad story about rough contributor experience.

andreasabel commented 1 year ago

fix-whitespace originates from the Agda dev tools and was then generalized to be usable in other projects. I think the whitespace problem is a bit different because whitespace is invisible, so people don't naturally care about it. Some working environments will remove trailing whitespace and some don't, and then if you work together in a repository, you might get diffs and merge conflicts just because of trailing whitespace or tabs vs. spaces. (darcs, used originally as VC, was extra aware about trailing whitespace and would print it in red in hunks.)

Personally, I miss easy ways to exempt from the rules in code formatters (and linters). Sometimes a tabular alignment of code greatly improves readability by visualizing the underlying pattern. Code formatter do not recognize and preserve 2D layout. I adopted Tibbe's written style guide instead (https://github.com/andreasabel/haskell-style-guide). Papier ist geduldig.

However, the presence of a code formatter in a project would not concern me very much. It is then a "God given". E.g., I did a bit of maintaining at haskell/actions which has a code formatter in its pre-commit hook, and then it just formats your code, fullstop. This is certainly acceptable.

To make auto-formatting not a burden for new developers, it should be as automatic as in haskell/actions: No extra installation needed, running on the staged changes in a pre-commit hook, nothing one has to remember to do...

The Formolu snippet looks good at first sight. I'd set off haddock comments a bit more, so that they are indented relative to the type of the argument.

+build
+  :: PackageDescription
+       -- ^ Mostly information from the .cabal file
+  -> LocalBuildInfo
+       -- ^ Configuration information
+  -> BuildFlags
+       -- ^ Flags that the user passed to build
+  -> [PPSuffixHandler]
+       -- ^ preprocessors to run before compiling
+  -> IO ()
Kleidukos commented 1 year ago

@andreasabel Thank you for this feedback. Indeed the idea is to make things as automated and transparent as possible. Fourmolu would also replace fix-whitespace's trailing spaces removal, and sorts the imports. :)

andreabedini commented 1 year ago

I woud welcome this and 100% subscribe to what @emilypi says. I don't have a strong preference towards the particular style (as long as I can configure hls to use it).

andreasabel commented 1 year ago

In practice, though, a reformatting of the whole codebase is never possible because there are always open PRs that would get destroyed by such an invasion on base branch...

Kleidukos commented 1 year ago

@andreasabel This led me to take a look at the currently opened PRs, and it would appear that a certain proportion of them has been dormant for such a long time that conflicts are inevitable. Moreover having automated formatting means that active PRs can quickly catch-up by being applied fourmolu. :)

fendor commented 1 year ago

I am fully in favour of owning the style and having a one-click solution to stop unnecessary debate about styling.

I am a bit hesitant because I use git blame quite a lot to figure out reasoning and context. A huge formatting commit will make git blame useless for some time. Now, it is still possible to trace requirements/decisions, but much harder. Can we come up with some help for developers how they can still leverage the history of the project? E.g. I found this stack overflow question https://stackoverflow.com/questions/50469927/view-git-history-of-specific-line, maybe we can add some docs to Contributing.md?

Or am I the only one who doesn't know off the top of my head how to get the history of a line :sweat_smile:

Kleidukos commented 1 year ago

@fendor thank god we're past that kind of problem these days, check out https://www.stefanjudis.com/today-i-learned/how-to-exclude-commits-from-git-blame/

ulysses4ever commented 1 year ago

I also use git-blame a lot, and I think it's a good idea to educate people about git config blame.ignoreRevsFile .git-blame-ignore-revs in the contributing file.

Kleidukos commented 1 year ago

I've added an init rule to the Makefile for this usecase.

andreabedini commented 1 year ago

This was implemented in #8950. Anything else left to do?

Kleidukos commented 1 year ago

I don't think so! Well, glad that we've done it.

See you at the next merge conflicts! ;-)

ulysses4ever commented 1 year ago

8950 removed fix-whitespace job, which served not only .hs files but also .md and .rst files. I think this should be restored. Maybe I even created a ticket for that though.

ulysses4ever commented 1 year ago

That would be https://github.com/haskell/cabal/issues/9020

grayjay commented 1 year ago

One of the formatting PRs hasn't been merged yet: #8999

emilypi commented 1 year ago

That's quite the conflict set @grayjay! @Kleidukos - is it possible to follow that one through? I can merge asap.