Support calling formatters on each file separately

michaelpj commented 2 months ago

Is your feature request related to a problem? Please describe.

At work we use treefmt to call ormolu on ~2.5k Haskell files. This takes about 32 seconds.

A natural improvement would be to format those files in parallel. At the moment, that would require changing the formatter.

The alternative would be for treefmt to handle the parallelism by running the formatter on each file individually. Then the formatter doesn't need to do anything.

Describe the solution you'd like

Some way of specifying how treefmt should call the formatter. Here's one option:

Add a batchSize option to a formatter, with no batch size meaning "infinite"
Split up the files to format into batches of at most batchSize, call the formatter once with each batch, in parallel.

Then:

No batch size behaves as today
Batch size 1 runs the formatter on just one file each time
Intermediary batch sizes can be tuned on a case-by-case basis

Describe alternatives you've considered

Do nothing, expect formatters to handle this.

michaelpj commented 2 months ago

Note: I tried running the formatter once per-file in parallel using fd, and it wasn't much faster. So maybe there is another mystery here.

brianmcgee commented 2 months ago

@michaelpj which version of treefmt are you using?

In v2 we implemented a new approach:

for each path determine the sequence of formatters to apply, providing us with a unique batch key
batch formatting tasks by the batch key until we reach the batch size, currently hardcoded to 1024, at which point we fire off a go routine which will apply each formatter in sequence to the batch of paths.

The errgroup for applying the formatters is bounded by runtime.NumCPU(). With all this in mind, you should already be seeing some concurrency.

brianmcgee commented 2 months ago

If we allowed providing the batch size that could be used to reduce the batch size and further improve concurrency, sending smaller numbers of paths to ormolu.

michaelpj commented 2 months ago

Ah, interesting. I am indeed on 0.6.1! I'll see if I can get the newer one.

brianmcgee commented 2 months ago

I've created https://github.com/numtide/treefmt/issues/334 to follow up on the batch size

numtide / treefmt

Support calling formatters on each file separately #333