Open buggymcbugfix opened 3 years ago
If the comment originally pertained to a set of extensions, then it seems like a mistake to only mention it for one of the extensions. I think the current behavior is the right way to treat this situation given that we still want to normalize how language extensions are activated (one pragma per line).
I understand the thinking here, but on second thought I think the current behaviour is not what we want.
Firstly, in the codebase that I ormolu-fied, many files have very long comments at the top of the file above language pragmas and I had to manually remove hundreds of lines of duplicated comments. It took very long. Not a single comment actually related to the language pragmas.
Even if the comment did relate to the language pragmas, it would still make more sense to keep it non-duplicated above the language pragmas—otherwise you'd suddenly have to edit n comments every time you update the comment.
How do we deal with this in the general case?
Consider
$ cat tlc.hs
-- This module needs some TLC
-- TLC stands for:
-- _T_ender
-- _L_ove
-- and _C_are.
-- Unfortunately it was overlooked as part of POSIX standardisation.
{-# language FlexibleContexts, FlexibleInstances, ScopedTypeVariables #-}
-- the above extensions are harmless
{-# language DatatypeContexts, IncoherentInstances #-}
-- ^^^^ we need to refactor to get rid of these language extensions!!!!!!!!!!!!!
Currently we get:
$ ormolu tlc.hs
-- the above extensions are harmless
{-# LANGUAGE DatatypeContexts #-}
-- This module needs some TLC
-- TLC stands for:
-- _T_ender
-- _L_ove
-- and _C_are.
-- Unfortunately it was overlooked as part of POSIX standardisation.
{-# LANGUAGE FlexibleContexts #-}
-- This module needs some TLC
-- TLC stands for:
-- _T_ender
-- _L_ove
-- and _C_are.
-- Unfortunately it was overlooked as part of POSIX standardisation.
{-# LANGUAGE FlexibleInstances #-}
-- the above extensions are harmless
{-# LANGUAGE IncoherentInstances #-}
-- This module needs some TLC
-- TLC stands for:
-- _T_ender
-- _L_ove
-- and _C_are.
-- Unfortunately it was overlooked as part of POSIX standardisation.
{-# LANGUAGE ScopedTypeVariables #-}
-- ^ ^^^ we need to refactor to get rid of these language extensions!!!!!!!!!!!!!
This example shows that it makes little sense to make assumptions about which language
pragmas the comments relate to.
The only sane output I can think of in this case is something like:
-- This module needs some TLC
-- TLC stands for:
-- _T_ender
-- _L_ove
-- and _C_are.
-- Unfortunately it was overlooked as part of POSIX standardisation.
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE ScopedTypeVariables #-}
-- the above extensions are harmless
{-# LANGUAGE DatatypeContexts #-}
{-# LANGUAGE IncoherentInstances #-}
-- ^^^^ we need to refactor to get rid of these language extensions!!!!!!!!!!!!!
But clearly this would break idempotency with the current printer. Running Ormolu on this currently we get:
-- the above extensions are harmless
{-# LANGUAGE DatatypeContexts #-}
-- This module needs some TLC
-- TLC stands for:
-- _T_ender
-- _L_ove
-- and _C_are.
-- Unfortunately it was overlooked as part of POSIX standardisation.
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE IncoherentInstances #-}
{-# LANGUAGE ScopedTypeVariables #-}
-- ^ ^^^ we need to refactor to get rid of these language extensions!!!!!!!!!!!!!
I know this is a bit of an edge case, but it has added a massive burden to our adoption of ormolu.
IMO the most sensible behaviour would be to change printing to keep comments above/below pragmas (not ones on the same line) as fences and only expanding and sorting pragmas within fenced blocks.
The problem here is that there could be also a long comment about the pragmas and we have no way to tell whether it is the case or not.
Typically all Haskell modules (perhaps except for Main
) have a header like this:
module TLS where
The description of the module can be attached as a Haddock:
{-# language FlexibleContexts, FlexibleInstances, ScopedTypeVariables #-}
-- the above extensions are harmless
{-# language DatatypeContexts, IncoherentInstances #-}
-- we need to refactor to get rid of these language extensions!!!!!!!!!!!!!
-- | This module needs some TLC
-- TLC stands for:
-- _T_ender
-- _L_ove
-- and _C_are.
-- Unfortunately it was overlooked as part of POSIX standardisation.
module TLS where
Already we can see that when the module is written with Haskell conventions in mind Ormolu has an easier time understanding what is going on:
-- the above extensions are harmless
{-# LANGUAGE DatatypeContexts #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
-- the above extensions are harmless
{-# LANGUAGE IncoherentInstances #-}
{-# LANGUAGE ScopedTypeVariables #-}
-- we need to refactor to get rid of these language extensions!!!!!!!!!!!!!
-- | This module needs some TLC
-- TLC stands for:
-- _T_ender
-- _L_ove
-- and _C_are.
-- Unfortunately it was overlooked as part of POSIX standardisation.
module TLS where
Another problem is that there is no way for Ormolu to know whether a comment is about the extensions that follow it or extensions above it. Right now we assume the former case, but it is kind of arbitrary.
I think to Ormolize your codebase we both would need to do some work. From your side it could be adding module headers and turning comments at the top into module Haddocks. We'd also like to hear if you have concrete suggestions about how treating of language pragma comments should be done. Then, we from our side can try to implement those suggestions if they make sense.
The comment gets duplicated for every language extension in the pragma.