unisonweb / unison

A friendly programming language from the future
https://unison-lang.org
Other
5.78k stars 270 forks source link

Auto-format codebase #1653

Closed seagreen closed 2 years ago

seagreen commented 4 years ago

When Unison started none of the Haskell auto-formatters were that amazing. So I understand why it's hand-formatted.

However, the latest generation of formatters are really quite good. I find ormolu especially nice, though it doesn't really matter which one you pick. Sure, sometimes they make ugly decisions, but in return you get:

(a) A faster coding experience (no hand-formatting imports!) and

(b) consistency (whether you like the decisions or not, at least it's the same everywhere)

Is this something the core Unison devs are interested in?

PS: If you do go with ormolu you can still hand-format specific parts of code by wrapping it with {- ORMOLU_DISABLE -} and {- ORMOLU_ENABLE -}. This is occasionally helpful for things like enormous pattern matches where hand-formatting the columns can help readability.

aryairani commented 4 years ago

I'd be interested. How would we go about introducing such a thing, though?

Do we... pause development and reformat the whole codebase all at once? (For PR diff readability purposes, it would be nice to have the big diffs behind us.) Or reformat one file at a time, slowly, for a while? One function at a time? Git pre-commit hooks?

Re. {- ORMOLU_DISABLE -} do we need to audit each change for readability, then go back and say "no wait, this is a mess"? And how frequently would we need to do that?

seagreen commented 4 years ago

I'd be interested.

:balloon:

How would we go about introducing such a thing, though?

It's definitely situational. In general though I like the "hit everything at once in one big PR" approach.

The extra commit touching every file in the codebase is annoying, but I think the gradual way is worse. You don't want every PR for the next 6 months to randomly contain formatting churn if it hits a file that hasn't been ormolu'd yet.

Using this strategy you'd wait until there are no big outstanding PRs, then make a branch and ormolu everything on it.

Unfortunately yes, you'd want to look over every file beforehand and {- ORMOLU_DISABLE -} where you want to keep manual formatting. However this isn't too bad, I've done it for a codebase bigger than Unison's and it doesn't actually take that long. IMHO the need for {- ORMOLU_DISABLE -} will be pretty rare, maybe a couple in the whole project.

do we need to audit each change for readability, then go back and say "no wait, this is a mess"? And how frequently would we need to do that?

Once everything's ormolu'd the work is done, I'd stop worrying about formatting entirely. The cases requiring {- ORMOLU_DISABLE -} are honestly so rare.

I'll give another example-- I use it in custom preludes to keep two types of imports separate: (1) imports as X for the purpose of re-exporting, and (2) imports to be used in the custom prelude module but not re-exported. It just doesn't come up much.

Git pre-commit hooks?

A HIGHLY interesting idea, though I haven't used them for this yet. That might be especially good though in such a big project with lots of contributors.

mitchellwrosen commented 4 years ago

I'd be very pleased if Unison was hit by ormolu. As a contributor, it is very hard for me to edit the code in such a way that the git diffs are as clear as possible. This is because I often have to move code around while digging into a problem to better understand it, even if by the end of it I'm only making a small tweak. So, it'd be a big, big time saver and headache-reliever :)

As far as how to go about it - I disagree that the best way is to hit the codebase all at once. I think that could result in some painful merge conflicts for contributors with large outstanding patches.

My recommendation: format the codebase slowly over time so as to ease the merge conflict pain for those who have unmerged work. if you are editing a module, first hit it with ormolu. If there's a diff, put that in its own commit called "ormolu". Your patch may touch a few modules - each time any formatting occurs, just amend the one "ormolu" commit.

By the end, you'll have one commit containing all formatting changes, and only formatting changes, and this commit can be skipped over in code review.

At least for a first pass, I don't think there's a ton of value in staying on top of the formatting with hooks and/or CI. It seems simple enough to remind others to please format the code before merging; if a mistake happens, sometimes one will end up with unrelated formatting changes in one's patch, and this could just be called out as such.

Anyway, guess I had a lot to say about this :) For context, we use ormolu at $WORK, and the feeling I have over there when I open a module is "yessss, I can tear this place up, then snap it all back into place", whereas when I'm working on Unison code I have a feeling of "I better be really careful here because I don't want to put together a confusing or misleading diff"

seagreen commented 4 years ago

As far as how to go about it - I disagree that the best way is to hit the codebase all at once. I think that could result in some painful merge conflicts for contributors with large outstanding patches.

My recommendation: format the codebase slowly over time

Great point, I may have been wrong to suggest hitting everything at once. Since it's open source how could we be sure there are no big outstanding patches?

aryairani commented 4 years ago

Ormolu (vs something else)?

Ormolu sounds good to me. There are no settings; we'll just learn how to read it. If I understood @runarorama, @pchiusano, they also don't care about the choice of formatter, only that we don't have to think about formatting anymore.

Hit everything at once?

I think we should hit everything at once. Formatting the codebase slowly will eventually reach the same state, with the same implications for folks with sufficiently outstanding PRs. And can't the large outstanding patch also be formatted with ormolu before merging, resulting in a small diff again? @mitchellwrosen

Formatting hooks / CI

I think it's gotta be enforced, or it will just drift and cause diff problems for the next guy. Am I wrong? I don't think I can just eyeball some code in a PR and notice whether it would be changed or unchanged by ormolu.

Side note, ormolu doesn't know about default extensions (#1644), so it needs an ugly command-line to run. (Currently, it can't parse some files that use TypeApplications without declaring it.) But maybe https://fgaz.me/posts/2019-11-14-cabal-multiple-libraries/ to enforce that all packages are using the same default-extensions, + a script to launch ormolu with the corresponding GHC options? Update: with -o -XTypeApplications alone, it does seem to succeed.

seagreen commented 4 years ago

If I understood @runarorama, @pchiusano, they also don't care about the choice of formatter, only that we don't have to think about formatting anymore.

The blessed opinion.

I think we should hit everything at once.

Love the aggression. I'm fine with either.

I think it's gotta be enforced, or it will just drift and cause diff problems for the next guy. Am I wrong?

This makes sense to me. Agreed you wouldn't always be able to tell by eyeballing.

Side note, ormolu doesn't know about default extensions

Yeah, this is annoying for sure. Unfortunately passing -o -XTypeApplications is only option unless we remove TypeApplications from all the default-extensions, which would also be annoying.

mitchellwrosen commented 4 years ago

I think you would want to put TypeApplications, PatternSynonyms, and the other handful of syntax-stealing extensions at the top of every file, if the codebase is meant to be formatted with ormolu. This makes it so contributors (presumably) only need to configure a hotkey in their editor to run ormolu over the current buffer, without being project-aware (e.g. oh this is a unison repo - need to pass -o TypeApplications)

mitchellwrosen commented 4 years ago

I think it's gotta be enforced, or it will just drift and cause diff problems for the next guy. Am I wrong? I don't think I can just eyeball some code in a PR and notice whether it would be changed or unchanged by ormolu.

It does make sense to enforce this, I just meant if a slip-up is made, it's easily addressed at any time by formatting the whole codebase, seeing a diff, and pushing that to trunk. Of course, if it's an easy check to add to CI, then all the better.