scalacenter / scalafix

Refactoring and linting tool for Scala
https://scalacenter.github.io/scalafix/
BSD 3-Clause "New" or "Revised" License
821 stars 185 forks source link

--diff appears to be slow / not actually using git diff #1714

Open james-s-w-clark opened 1 year ago

james-s-w-clark commented 1 year ago

I am trying to speed up our CI, which runs stuff like scalafix and scalafmt. scalafmt (with our Mill task, at least) is able to use caching to prevent unnecessary work (inital checks may take a few minutes; further checks with no/small diffs take ~seconds).

Most PRs in CI only change a small percentage of files, so there shouldn't be much to format or fix.

I noted in the documentation there are the --diff and --diff-base branch/tag/commit options for scalafix. I expected that:

  1. files not in git diff would not be checked
  2. possibly even better, only changed content in added/modified/renames would need to be checked
  3. the above would presumably make scalafix much faster with the --diff options, when the percentage of files changed is small. This assumes git is fast, and scalafix has some intense work to do.

I have read your CliGitDiffSuite tests, and it looks like functionality for 1 & 2 is there.

However, at least through the mill-scalafix plugin, --diff actually seems to slow things down.

Here is a link to an issue I opened in mill-scalafix. From a few runs, it looks like --diff is about 50% slower.

In this scalafix repo I tried sbt "scalafixAll --check" and sbt "scalafixAll --check --diff-base main, and didn't see much difference in time taken (even with no diff, on a branch off of main).

bjaglin commented 1 year ago

scalafmt (with our Mill task, at least) is able to use caching to prevent unnecessary work (inital checks may take a few minutes; further checks with no/small diffs take ~seconds)

I assume that's a feature of the mill-scalafmt plugin? If you were using sbt-scalafix, you would also benefit from incremental runs out of the box. I don't think mill-scalafix has anything like that at the moment.

However, at least through the mill-scalafix plugin, --diff actually seems to slow things down.

Could you at least confirm that there were less files targeted (via --verbose)? I am not familiar with mill, so I don't know what kind of side effects we could have (a significant time might be spent outside scalafix, recompiling sources with semanticdb for example). Could you run the same experiment with the CLI?

In this scalafix repo I tried sbt "scalafixAll --check" and sbt "scalafixAll --check --diff-base main, and didn't see much difference in time taken (even with no diff, on a branch off of main).

As sbt-scalafix is incremental, this test is probably biased. Same suggestion as above: you should try to use the CLI directly if you want to assess the impact of --diff.

james-s-w-clark commented 1 year ago

Good call on trying with cs's scalafix to remove bias.

I already gave it a shot yesterday:

I'll try a bit more to get cs's scalafix it working on our source code. A full scalafix there takes a lot longer than for this repo.


The mill-scalafix repo runs with coursier's scalafix off of main (or, a branch off that with no changes).

The codebase is fairly small here so I can't compare time/performance yet, but it looks like functionally the diff is having no effect.