pinterest / ktlint

An anti-bikeshedding Kotlin linter with built-in formatter
https://pinterest.github.io/ktlint/
MIT License
6.23k stars 512 forks source link

ktlint --stdin takes too long and only supports a single file #2754

Closed mshima closed 3 months ago

mshima commented 3 months ago

Expected Behavior

ktlint --stdin should be fast or provide a batch mode.

We are using ktlint --stdin to format in memory files when generating/regenerating an entire project. With ktlint disabled a sample project takes ~45 seconds to generate, it increases to ~8m20s with ktlint formatting.

Current Behavior

ktlint --stdin takes more than 2 seconds to format each file.

Additional information

paul-dingemans commented 3 months ago

When I run Ktlint on a collection of project containing more than 5000 files/directories, it takes about 30 seconds.

If your project starts at a deeply nested folder structure than make sure that an .editorconfig file, containingroot = true element, exists in the root folder of the project.

Also using option --stdin is not the most efficient way to run Ktlint on multiple files.

I cannot provide further assistance without a reproducer.

mshima commented 3 months ago

When I run Ktlint on a collection of project containing more than 5000 files/directories, it takes about 30 seconds.

Yes, I understand that using in disk is quick, but we need to run in memory.

The project has 428 kt files:

kotlin % find . -type f -name "*\.kt" -print | wc -l
     428

Running ktlint in a single file takes ~2 seconds:

% cat src/main/kotlin/tech/jhipster/sample/config/SecurityConfiguration.kt | time .ktlint/1.3.1/ktlint --stdin -F > /dev/null
.ktlint/1.3.1/ktlint --stdin -F > /dev/null  2,26s user 0,13s system 121% cpu 1,958 total

Running ktlint in every file would take 856 seconds in series.

--stdin is needed because:

You can see the java implementation in https://github.com/jhipster/generator-jhipster and kotlin implementation in https://github.com/jhipster/jhipster-kotlin.

CI run using ktlint is in https://github.com/mshima/jhipster-kotlin/actions/runs/10078199400/job/27862760291?pr=7

Run cli.cjs generate-sample --skip-jhipster-dependencies --skip-install     8m 53s

Current main is in https://github.com/jhipster/jhipster-kotlin/actions/runs/10073101444/job/27846452712

Run cli.cjs generate-sample --skip-jhipster-dependencies --skip-install     39s
mshima commented 3 months ago

Running ktlint by itself takes ~1,5 seconds:

kotlin % echo 'class Foo(){}' | time .ktlint/1.3.1/ktlint --stdin -F > /dev/null
.ktlint/1.3.1/ktlint --stdin -F > /dev/null  1,20s user 0,10s system 90% cpu 1,429 total

So bootstrapping takes that long, it's probably necessary to add a batch mode to --stdin option passing a formatted group of files or using delimiter.

paul-dingemans commented 3 months ago

I don't know whether you can blame ktlint for the time it takes to process. When I run a command like below, it takes less then 300ms to complete (including debug logging):

ktlint-1.3.1 --stdin -F --log-level=debug < src/main/kotlin/Foo.kt

or

echo "class Foo(){}" | cat | ktlint-1.3.1 --stdin -F --log-level=debug

I know that when including the logging that it is included in the stdout and not useful in your case. But the interesting part is that it reports how much time ktlint used for processing. Maybe you can try a run including the logging and than analyse whether it is indeed ktlint which is slow.

17:54:50.753 [main] DEBUG com.pinterest.ktlint.cli.internal.KtlintCommandLine -- Finished processing in 176ms / 1 file(s) scanned / 3 error(s) found
mshima commented 3 months ago

Cold run results is quite different from hot run. Cold run:

% echo 'class Foo' | time .ktlint/1.3.1/ktlint --stdin -F --log-level=debug | grep processing
15:22:06.978 [main] DEBUG com.pinterest.ktlint.rule.engine.internal.CodeFormatter -- Starting with processing file '<stdin>'
15:22:07.240 [main] DEBUG com.pinterest.ktlint.rule.engine.internal.CodeFormatter -- Finished with processing file '<stdin>'
15:22:07.240 [main] DEBUG com.pinterest.ktlint.cli.internal.KtlintCommandLine -- Finished processing in 313ms / 1 file(s) scanned / 0 error(s) found
.ktlint/1.3.1/ktlint --stdin -F --log-level=debug  0,81s user 0,09s system 64% cpu 1,388 total
grep processing  0,00s user 0,00s system 0% cpu 1,388 total

Second run:

% echo 'class Foo' | time .ktlint/1.3.1/ktlint --stdin -F --log-level=debug | grep processing
15:22:15.480 [main] DEBUG com.pinterest.ktlint.rule.engine.internal.CodeFormatter -- Starting with processing file '<stdin>'
15:22:15.708 [main] DEBUG com.pinterest.ktlint.rule.engine.internal.CodeFormatter -- Finished with processing file '<stdin>'
15:22:15.708 [main] DEBUG com.pinterest.ktlint.cli.internal.KtlintCommandLine -- Finished processing in 272ms / 1 file(s) scanned / 0 error(s) found
.ktlint/1.3.1/ktlint --stdin -F --log-level=debug  0,80s user 0,07s system 95% cpu 0,911 total
mshima commented 3 months ago

By the way, my tests uses MacOS air/mini with m1 processor.

paul-dingemans commented 3 months ago

Cold run results is quite different from hot run.

Do you mean the difference between 313ms and 272ms? I think that difference is way too small to conclude that ktlint is the bottleneck. The difference can also be caused by having files in an internal cache of the OS.

mshima commented 3 months ago

Cold run results is quite different from hot run.

Do you mean the difference between 313ms and 272ms? I think that difference is way too small to conclude that ktlint is the bottleneck. The difference can also be caused by having files in an internal cache of the OS.

1,388s and 0,911s, yes it's related to OS cache.

So 1,388s is the total time, 313ms is the processing time, so we have 1s for java bootstrapping and ktlint bootstrapping. It's quite difficult to improve that.

Would be nice to have something like:

echo 'Foo.kt$class Foo() {}$Bar.kt$class Bar() {}$' | ktlint --stdin -F --batch --delimiter '$'
Foo.kt$class Foo$Bar.kt$class Bar$
paul-dingemans commented 3 months ago

I understand the use case for your project. But for now, I have to say no to this. I see no value for this option for other use cases.