tweag / ormolu

A formatter for Haskell source code
https://ormolu-live.tweag.io
Other
956 stars 83 forks source link

Ambiguous fixities cause unpleasant rendering #892

Closed parsonsmatt closed 1 year ago

parsonsmatt commented 2 years ago

Is your feature request related to a problem? Please describe.

We use esqueleto at work, which has a bunch of eDSL specific operators - ^. ==. in_ ?., on, etc. An example of this is:

foo = select $ do
  t <- from $ table @Bar
    `innerJoin` table @Baz
      `on` do 
        \(br :& bz) -> whatever
  where_ $
    t ^. BarInt ==. val 3
   &&. t ^. BarName `in_` valList ["hello", "world"]

These operators are shared in a few different packages, and ormolu does not select the "correct" ones, so it puts each operator on it's own line. For on, it uses the Prelude, which gives a particularly bad format.

foo = select $ do
  t <- from $ 
      table @Bar
      `innerJoin` table @Baz
    `on` do 
        \(br :& bz) -> whatever
  where_ $
    t 
    ^. BarInt 
    ==. val 3
    &&. t 
    ^. BarName 
    `in_` valList ["hello", "world"]

This can be worked-around by putting those values in the fixity configuration. This works for us since we don't use the other packages on Hackage. However, it's a bit obnoxious to provide overrides, and if we ever do start using other packages, then we'll have conflicts that won't be easy to resolve.

Describe the solution you'd like

If ormolu were capable of looking up the provenance of an operator, then that could be sufficient. For example, suppose we have the following import list:

import Database.Esqueleto.Experimental

foo = select $ do
  t <- from $ 
      table @Bar
      `innerJoin` table @Baz
    `on` do 
        \(br :& bz) -> whatever
  where_ $
    t 
    ^. BarInt 
    ==. val 3
   &&. t 
    ^. BarName 
    `in_` valList ["hello", "world"]

Then when ormolu is formatting this file, it can look at the list of imports. If any of them are known to bring in a specific fixity for an operator, then we can just use that unambiguously, without having to do actual GHC value provenance checking.

This might suggest further configuration to provide re-export modules. As an example, we have an "internal" Esqueleto import that bundles up most of what we need.

module A.MercuryPrelude.Esqueleto (module X) where

import A.MercuryPrelude as X hiding (on, (==.), etc)
import Database.Esqueleto.Experimental as X

We'd want to be able to say that module X re-exports operators from module Y somehow, so that we get proper formatting with re-exports.

Describe alternatives you've considered

The "most correct" approach is to use GHC to determine what name is actually being used here. But this would require digging more into the deeper parts of the compiler, which likely has trade-offs that we want to avoid. Notably, we'd need more of the code to actually build, which would make using fourmolu without eg a package database much more difficult.

jonathanlking commented 1 year ago

I think another example of this is the (.=) operator, which is

We have code the depends on both lens and aeson (i.e. ormolu --no-cabal -p lens -p aeson). Without an infixr 8 .= override the following:

instance ToJSON Example where   
  toEncoding (Example a b c d) =    
    pairs $                     
      "a" .= a                  
        <> "b" .= b   
        <> "c" .= c

becomes:

instance ToJSON Example where
  toEncoding (Example a b c d) =
    pairs $
      "a"
        .= a
        <> "b"
        .= b
        <> "c"
        .= c

This wouldn't always work, but...

aeson has a higher popularity (2310) than lens (1576), so sorting the higherPriorityPackages (in buildFixityMap') might be a helpful heuristic to introduce? (At the moment the order of -p flags doesn't effect things).

I also used the following command to help explore the hackage-info.json:

OPERATOR=".="; curl https://raw.githubusercontent.com/tweag/ormolu/master/extract-hackage-info/hackage-info.json | jq --arg operator "$OPERATOR" '.popularity as $popularity | .operators | keys[] as $package | {package: $package, details: .[$package][$operator], popularity: $popularity[$package]} | select(.details)' | jq -s 'sort_by(.popularity) | reverse'
jonathanlking commented 1 year ago

I've spent a while thinking of how to phrase this (and also don’t want to derail this issue); I'm a huge fan of ormolu but also want to be candid about my experience.

I appreciate the motivation for introducing fixities, but for me at the moment, the cost of setting up fixity overrides outweighs the benefits. As a user upgrading from an earlier version, I needed to spend time understanding and setting this up, otherwise my code became less readable.

I’m also concerned that this isn’t just an up-front cost, but will be an ongoing one:

I hope this is read in the in the spirit in which it was intended and I really do appreciate all the work that goes into ormolu!

tbagrel1 commented 1 year ago

@parsonsmatt As far as I understand, if esqueleto is present in your cabal dependencies, then ormolu should pick it up, and select the correct fixities for ^., ==., in_ ...

You would just need a manual override for on because only base operators have higher priority than the ones from cabal dependencies (see https://github.com/tweag/ormolu/blob/master/src/Ormolu/Fixity.hs#L140).

I just checked that esqueleto operators are indeed present in the hackage-info.json file. Please tell me if it doesn't work as intended for your use-case.

It's unfortunately quite hard to find which operators from which packages are actually imported (especially for "import all" imports with no explicit import list), as we don't really have a way at the moment to see exactly which symbol is brought into scope by each module. E.g. it would be very complex to follow re-exports of operators in library code to find the source of fixity declarations.

In https://github.com/tweag/ormolu/pull/832 I originally suggested a more fine-grained fixity configuration, allowing to select all / none / all except X / none except Y operators from package P, and set some priority over imported packages. But that wouldn't be compatible -- in this form at least -- with the cabal dependencies autodetection that we have today. Is it something you would like to see improved?

@jonathanlking At the moment, packages from the "cabal dependencies" list are merged together with a threshold > 1, which means that when we encounter conflicting fixities for a same operator, we will merge the available information (all the logic is here).

With a threshold t, on a fixity conflict,

We could set this threshold to 0 for dependencies, so that only the most popular fixity would be selected. But that would be only a very minor improvement on a probably real issue: we might want a way to order dependencies, and hide/explicity import only a set of operators from a given package.

We can't really have a fully satisfying answer on this topic unfortunately without running GHC itself (or mimicking GHC) to actually read library code, and do symbol and fixity resolution.