Comment gets duplicated for multi-language pragma

buggymcbugfix commented 3 years ago

$ cat repro.hs 
-- comment
{-# LANGUAGE FlexibleContexts, FlexibleInstances #-}

module Foo where

$ ormolu -c repro.hs
-- comment
{-# LANGUAGE FlexibleContexts #-}
-- comment
{-# LANGUAGE FlexibleInstances #-}

module Foo where

$ ormolu --version
ormolu 0.3.0.0 HEAD 06b767c8b70e60321a4debc4599127446f958e5b
using ghc-lib-parser 9.0.1.20210324

The comment gets duplicated for every language extension in the pragma.

mrkkrp commented 3 years ago

If the comment originally pertained to a set of extensions, then it seems like a mistake to only mention it for one of the extensions. I think the current behavior is the right way to treat this situation given that we still want to normalize how language extensions are activated (one pragma per line).

buggymcbugfix commented 3 years ago

I understand the thinking here, but on second thought I think the current behaviour is not what we want.

Firstly, in the codebase that I ormolu-fied, many files have very long comments at the top of the file above language pragmas and I had to manually remove hundreds of lines of duplicated comments. It took very long. Not a single comment actually related to the language pragmas.

Even if the comment did relate to the language pragmas, it would still make more sense to keep it non-duplicated above the language pragmas—otherwise you'd suddenly have to edit n comments every time you update the comment.

buggymcbugfix commented 3 years ago

How do we deal with this in the general case?

Consider

$ cat tlc.hs

-- This module needs some TLC
-- TLC stands for:
-- _T_ender
-- _L_ove
-- and _C_are.
-- Unfortunately it was overlooked as part of POSIX standardisation.
{-# language FlexibleContexts, FlexibleInstances, ScopedTypeVariables #-}
-- the above extensions are harmless
{-# language DatatypeContexts, IncoherentInstances #-}
-- ^^^^ we need to refactor to get rid of these language extensions!!!!!!!!!!!!!

Currently we get:

$ ormolu tlc.hs

-- the above extensions are harmless
{-# LANGUAGE DatatypeContexts #-}
-- This module needs some TLC
-- TLC stands for:
-- _T_ender
-- _L_ove
-- and _C_are.
-- Unfortunately it was overlooked as part of POSIX standardisation.
{-# LANGUAGE FlexibleContexts #-}
-- This module needs some TLC
-- TLC stands for:
-- _T_ender
-- _L_ove
-- and _C_are.
-- Unfortunately it was overlooked as part of POSIX standardisation.
{-# LANGUAGE FlexibleInstances #-}
-- the above extensions are harmless
{-# LANGUAGE IncoherentInstances #-}
-- This module needs some TLC
-- TLC stands for:
-- _T_ender
-- _L_ove
-- and _C_are.
-- Unfortunately it was overlooked as part of POSIX standardisation.
{-# LANGUAGE ScopedTypeVariables #-}

-- ^ ^^^ we need to refactor to get rid of these language extensions!!!!!!!!!!!!!

This example shows that it makes little sense to make assumptions about which language pragmas the comments relate to.

The only sane output I can think of in this case is something like:

-- This module needs some TLC
-- TLC stands for:
-- _T_ender
-- _L_ove
-- and _C_are.
-- Unfortunately it was overlooked as part of POSIX standardisation.
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE ScopedTypeVariables #-}
-- the above extensions are harmless
{-# LANGUAGE DatatypeContexts #-}
{-# LANGUAGE IncoherentInstances #-}
-- ^^^^ we need to refactor to get rid of these language extensions!!!!!!!!!!!!!

But clearly this would break idempotency with the current printer. Running Ormolu on this currently we get:

-- the above extensions are harmless
{-# LANGUAGE DatatypeContexts #-}
-- This module needs some TLC
-- TLC stands for:
-- _T_ender
-- _L_ove
-- and _C_are.
-- Unfortunately it was overlooked as part of POSIX standardisation.
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE IncoherentInstances #-}
{-# LANGUAGE ScopedTypeVariables #-}

-- ^ ^^^ we need to refactor to get rid of these language extensions!!!!!!!!!!!!!

I know this is a bit of an edge case, but it has added a massive burden to our adoption of ormolu.

IMO the most sensible behaviour would be to change printing to keep comments above/below pragmas (not ones on the same line) as fences and only expanding and sorting pragmas within fenced blocks.

mrkkrp commented 3 years ago

The problem here is that there could be also a long comment about the pragmas and we have no way to tell whether it is the case or not.

Typically all Haskell modules (perhaps except for Main) have a header like this:

module TLS where

The description of the module can be attached as a Haddock:

{-# language FlexibleContexts, FlexibleInstances, ScopedTypeVariables #-}
-- the above extensions are harmless
{-# language DatatypeContexts, IncoherentInstances #-}
-- we need to refactor to get rid of these language extensions!!!!!!!!!!!!!

-- | This module needs some TLC
-- TLC stands for:
-- _T_ender
-- _L_ove
-- and _C_are.
-- Unfortunately it was overlooked as part of POSIX standardisation.

module TLS where

Already we can see that when the module is written with Haskell conventions in mind Ormolu has an easier time understanding what is going on:

-- the above extensions are harmless
{-# LANGUAGE DatatypeContexts #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
-- the above extensions are harmless
{-# LANGUAGE IncoherentInstances #-}
{-# LANGUAGE ScopedTypeVariables #-}

-- we need to refactor to get rid of these language extensions!!!!!!!!!!!!!

-- | This module needs some TLC
-- TLC stands for:
-- _T_ender
-- _L_ove
-- and _C_are.
-- Unfortunately it was overlooked as part of POSIX standardisation.
module TLS where

Another problem is that there is no way for Ormolu to know whether a comment is about the extensions that follow it or extensions above it. Right now we assume the former case, but it is kind of arbitrary.

I think to Ormolize your codebase we both would need to do some work. From your side it could be adding module headers and turning comments at the top into module Haddocks. We'd also like to hear if you have concrete suggestions about how treating of language pragma comments should be done. Then, we from our side can try to implement those suggestions if they make sense.

tweag / ormolu

Comment gets duplicated for multi-language pragma #787