tweag / ormolu

A formatter for Haskell source code
https://ormolu-live.tweag.io
Other
958 stars 83 forks source link

Check multiple files at once? #362

Closed asterite closed 5 years ago

asterite commented 5 years ago

Hi!

Would it be possible to enhance ormolu by letting it check multiple files at once?

Why?

We have a project with 275 files. To check that they are formatted we must run:

ormolu --mode check {file}

for each of those files.

Each run takes between 0.05s and 0.08s depending on the file's size. Let's say 0.06s average. Multiply that by 275 and we get 16 seconds.

Now, 16 seconds is not really that bad, but I believe ormolu can do much better.

Why I believe that?

I'm the author of the Crystal programming language. Crystal comes with a built-in formatter. Here's the help:

$ crystal tool format --help
Usage: crystal tool format [options] [file or directory]

Options:
    --check                          Checks that formatting code produces no changes
    -i <path>, --include <path>      Include path
    -e <path>, --exclude <path>      Exclude path (default: lib)
    -h, --help                       Show this message
    --no-color                       Disable colored output
    --show-backtrace                 Show backtrace on a bug (used only for debugging)

Some example usages:

# Format all crystal (.cr) files inside src, inline (overrides contents)
$ crystal tool format src

# Check all crystal files inside src
$ crystal tool format --check src

# Format a couple of files
$ crystal tool format foo.cr bar.cr

# Check a couple of directories
$ crystal tool format --check src spec

If I compare running this:

# Check that every crystal file in the current directory, recursively, is correctly formatted
$ time crystal tool format --check .

real    0m0.637s
user    0m0.602s
sys 0m0.138s

against check each file with a separate command

# There are 1428 crystal files in the repo so I won't show those 1428 invocations
$ time ./script.sh

real    0m18.089s
user    0m9.021s
sys 0m8.220s

I'm not quite sure why it takes so much more time, but it seems just booting a program takes its time. And maybe there are initializations made by the entire program that don't need to be done in each run.

Does it matter?

If you develop locally and want to check that all files are formatted before committing, waiting half a second versus 18 seconds makes a lot of difference. That also applies for CI times.

In our case (in my workplace where we develop using Haskell) we have to come up with smart ways to avoid checking files that didn't change. It would be so much simpler (and faster!) if we could just check all files, given that it would take a second or so.

Can ormolu be as fast a crystal's formatter?

I think so. Crystal is statically typed and compiled. So is Haskell. And Haskell is probably more optimized because it's much older (Crystal is just 7 years old, I think).

Formatting to stdout

Crystal's formatter doesn't allow formatting a file and show the output in stdout. Ormolu does. But I think ormolu could still support formatting inline and checking multiple files and directories, but restrict stdout output to just a single file.

What do you think?

I can take a stab at a PR if we agree on the command line interface for this.

mrkkrp commented 5 years ago

But this is already possible, see: #314.

asterite commented 5 years ago

Ooooh... that's great! I missed it because it's so recent (21 days ago). I'll close this then. Thank you!

mboes commented 5 years ago

@asterite Your hunch was that passing multiple files at once would reduce total formatting time in your case. Did that turn out to be true?

asterite commented 5 years ago

Hi @mboes !

I just tried it. For those 275 files, with a single command, it takes about 8 seconds to format them all and about 6.6 seconds to check them all.

So compared to 16 seconds it is better. But Crystal can check 1428 in 0.6s seconds and format those in 1.1 seconds so I guess ormolu could still be optimized. But I don't know how to optimize functional programs. Imperative programs are easier to optimize (at least for me) :-)

mrkkrp commented 5 years ago

So far there has been zero effort to make Ormolu fast. I'm sure it can be optimized, but we first need to make it work 100% correctly.

mboes commented 5 years ago

And it's possible that parsing is the bottleneck. The GHC parser (which we reuse wholesale) is not known to be particularly fast.

yumiova commented 5 years ago

According to a GHC profiling report (with optimization level 2) of the Ormolu test suite, 95.2% of spent time and 98.1% of allocations stem from Ormolu.Parser.parseModule. Seeing as the only real heavy-weight and external function called in there is GHC's parser, that presumably is the source of the problematic performance.