tweag / ormolu

A formatter for Haskell source code
https://ormolu-live.tweag.io
Other
958 stars 83 forks source link

Challenging the CPP support #633

Open spl opened 4 years ago

spl commented 4 years ago

I like the goals of this project! I was hoping to use it, but it seems that I may be challenging the experimental C preprocessor support.

Describe the bug

I use CPP extensively in dlist to support multiple GHC versions. Ormolu fails with parse errors when I try to run it on Data/DList.hs, and I believe that's due to the CPP.

To reproduce

Run these commands:

git clone https://github.com/spl/dlist.git
cd dlist
git checkout -b e405a105ad7b2e6a83034bfa82efd22c4974ce02
ormolu Data/DList.hs

The output from ormolu is:

The GHC parser (in Haddock mode) failed:
  Data/DList.hs:36:3-19
  parse error on input `-- * Construction'

Expected behavior

Of course, I hoped it would work without effort. 😉 That said, I certainly understand that dealing with CPP when formatting code is a difficult task. So, I look at this issue as having one or both of two outcomes:

  1. You give me tips on how to resolve the issues Ormolu has with my code. (Or you tell me it's not possible!)
  2. You use dlist as a test case for your CPP support.

Either way, thanks for putting the effort into this project! 👏

Environment

Additional context

N/A

utdemir commented 4 years ago

You give me tips on how to resolve the issues Ormolu has with my code. (Or you tell me it's not possible!)

CPP support is almost impossible to get right as you mentioned. I am not the maintainer of this project, so that would be @mrkkrp's call to decide on which issues should be fixed. But I can provide some tips and workarounds (until GHC somehow gets nicer conditional compilation support).

Ormolu currently formats CPP by ignoring the whole block, and re-inserting them after the formatting, so we have to make sure that the code has valid syntax without the CPP block. In your case:

...
#if defined(__GLASGOW_HASKELL__) && __GLASGOW_HASKELL__ >= 708
{-# LANGUAGE PatternSynonyms #-}
...
#endif
...
module Data.DList

#if defined(__GLASGOW_HASKELL__) && __GLASGOW_HASKELL__ >= 800
  ( DList(Nil, Cons)
#else
  ( DList
#endif

  -- * Construction
  , fromList
  , toList
  ...

doesn't work, because once you remove the CPP blocks there is three problems:

So, those three things above should work, but the latter two expose a bug on Ormolu's CPP handling; and to workaround that we have to make it so that the first item in the export list should not be a CPP block.

In the end, below diff works in my case with the current master(bbcb812):

❯ git diff
diff --git a/Data/DList.hs b/Data/DList.hs
index 3cdd0aa..49ddadc 100644
--- a/Data/DList.hs
+++ b/Data/DList.hs
@@ -26,15 +26,16 @@
 -----------------------------------------------------------------------------

 module Data.DList
-
+(
+  id,
 #if defined(__GLASGOW_HASKELL__) && __GLASGOW_HASKELL__ >= 800
-  ( DList(Nil, Cons)
+   DList(Nil, Cons),
 #else
-  ( DList
+   DList ,
 #endif

   -- * Construction
-  , fromList
+   fromList
   , toList
   , apply

I hope this is useful. I guess a better solution would be to provide compatibility modules which are conditionally exposed via cabal, but that would be a bigger refactor.

spl commented 4 years ago

@utdemir Thanks for the thorough walkthrough! Much appreciated.

Ormolu currently formats CPP by ignoring the whole block, and re-inserting them after the formatting, so we have to make sure that the code has valid syntax without the CPP block.

That's helpful to know.

So, those three things above should work, but the latter two expose a bug on Ormolu's CPP handling; and to workaround that we have to make it so that the first item in the export list should not be a CPP block.

This is not a major thing, but I wonder if you can avoid changing the export list order by replacing the added id with DList and put the #if/#else/#endif around the (Nil, Cons). If you don't happen to try this, I might look into it later.

I hope this is useful.

Definitely!

I guess a better solution would be to provide compatibility modules which are conditionally exposed via cabal, but that would be a bigger refactor.

I don't recall considering that approach. Some potential concerns come to mind. Would that help with things like the export list? Would that avoid orphan instances?

spl commented 4 years ago

I wonder if you can avoid changing the export list order by replacing the added id with DList and put the #if/#else/#endif around the (Nil, Cons).

To answer my own question, it seems this doesn't work:

Input.hs ormolu Input.hs
```haskell module Input ( Type #if cond (Con1, Con2) #endif , value ) where ``` ```haskell module Input ( Type, #if cond (Con1, Con2) #endif value, ) where ```

Ormolu introduces a comma after Type, which would break the code when cond is true.

spl commented 4 years ago

Here's an example of another problem I encountered:

Input.hs ormolu Input.hs
```haskell instance Class Type where #if cond def = value1 #else def = value2 #endif ``` ```haskell instance Class Type #if cond def = value1 #else def = value2 #endif ```
spl commented 4 years ago

Oh, I guess my last comment (https://github.com/tweag/ormolu/issues/633#issuecomment-652820277) is an example of me forgetting the maxim:

Ormolu currently formats CPP by ignoring the whole block, and re-inserting them after the formatting

If I were to write instance Class Type where under each branch of the #if cond, it would unfortunately introduce some redundancy.

Out of curiosity, did you consider other strategies for dealing with CPP?

Lastly, I can still use Ormolu and – since there's not a lot of code in dlist – fix the errors created by formatting. So, this issue is not a major one, as far as it concerns me. Depending on your time and priorities, you should, of course, decide how much focus to give CPP.

mrkkrp commented 4 years ago

The present support for CPP is an experiment and a miracle in a way, because originally there was no intention to support CPP at all. I the minimal support for CPP just to make very simple things work, but the tricky cases like the ones you show in this thread are totally out of scope. It is unlikely that we will spent time on improving CPP support.

P.S. It is "Ormolu", not "Ormulo" ;-)

spl commented 4 years ago

It is unlikely that we will spent time on improving CPP support.

Understood, thanks!

P.S. It is "Ormolu", not "Ormulo" ;-)

Yes, I eventually noticed that my fingers kept typing the latter. But I didn't realize how often I did it on this page until now. 🙄 I've fixed them all to avoid confusing future readers.

spl commented 4 years ago

After re-reading the readme again (🔁), one thing that occurred to me was to try to disable Ormolu before and after the CPP statements so that Ormolu would ignore them. That is, I started with something like this, expecting the formatting to be fixed:

Test.hs:

module Test where

{- ORMOLU_DISABLE -}
#if MIN_VERSION_base(4,9,0)
{- ORMOLU_ENABLE -}

import Data.Semigroup(Semigroup(..)  )

{- ORMOLU_DISABLE -}
#endif
{- ORMOLU_ENABLE -}

The debug output follows:

$ ormolu --debug Test.hs 
warnings:

parse result:
  comment stream:
Test.hs:3:1-20 Comment False ("{- ORMOLU_DISABLE -}" :| [])
Test.hs:(3,21)-(11,21) Comment False ("{- ORMOLU_DISABLE_START" :| ["-- ORMOLU_CPP_MASK#if MIN_VERSION_base(4,9,0)","-- ORMOLU_CPP_MASK{- ORMOLU_ENABLE -}","-- ORMOLU_CPP_MASK","-- ORMOLU_CPP_MASKimport Data.Semigroup(Semigroup(..)  )","-- ORMOLU_CPP_MASK","-- ORMOLU_CPP_MASK{- ORMOLU_DISABLE -}","-- ORMOLU_CPP_MASK#endif","ORMOLU_DISABLE_END -}"])
Test.hs:11:22-40 Comment True ("{- ORMOLU_ENABLE -}" :| [])

module Test where

{- ORMOLU_DISABLE -}
#if MIN_VERSION_base(4,9,0)
{- ORMOLU_ENABLE -}

import Data.Semigroup(Semigroup(..)  )

{- ORMOLU_DISABLE -}
#endif
{- ORMOLU_ENABLE -}

But, as you can see, the code was not formatted.

Is there any way to make an approach like this work?

hseg commented 4 years ago

Another instance where this has bitten me:

module Test.QuickCheck.Classes.IntegralDomain
  (
#if HAVE_SEMIRINGS
    integralDomainLaws
#endif
  ) where

becomes

module Test.QuickCheck.Classes.IntegralDomain
  (
  )
where

#if HAVE_SEMIRINGS
    integralDomainLaws
#endif

under

ormolu 0.1.2.0 master b95feb345e14f333988b06be0fc6534694d6d0e9
using ghc-lib-parser 8.10.1.20200412