revelc / formatter-maven-plugin

Formatter Maven Plugin
https://code.revelc.net/formatter-maven-plugin
Apache License 2.0
284 stars 90 forks source link

New formatter config must invalidate cache #453

Closed walles closed 1 year ago

walles commented 3 years ago

Describe the bug A clear and concise description of what the bug is.

Versions (OS, Maven, Java, and others, as appropriate):

To Reproduce

  1. git clone git@github.com:walles/jedis.git
  2. cd jedis
  3. git checkout walles/format-on-mvn-validate-in-ci-2019
  4. mvn compile (this will run the formatter and initialize the cache)
  5. Remove all the rules from hbase-formatter.xml
  6. mvn compile
  7. Note that nothing was reformatted, even though you just changed the rules.
  8. rm target/formatter-maven-cache.properties
  9. mvn compile
  10. Note that lots of files got reformatted now, after you removed the cache.

Expected behavior Files should be reformatted when the formatting rules change.

Actual behavior Nothing gets reformatted, even if the formatting rules change.

ctubbsii commented 3 years ago

Why not just do mvn clean compile under these circumstances? I'm not sure this is a bug, but a new use case. I suppose the plugin could try to detect the formatter rules changes, but it'd be a naive guess. It seems that it'd be easier to just do mvn clean to wipe the cache, since formatter rules changes are probably rare.

walles commented 3 years ago

If this behaviour could be clearly documented, I think that would be an OK solution.

As it was, I spent half an hour in https://github.com/redis/jedis/pull/2495 trying to figure out why the formatter did one thing locally and another thing in CI.

Any change in this plugin that could have saved me (and future users who run into the same thing) that half hour would be much appreciated.

Maybe just checksum the formatter config as well and drop the other cache if it changes? Or just skip the caching entirely; how much time does from-scratch reformatting really take? Caching is hard :)

hazendaz commented 3 years ago

How much time does it save, a lot. We didn't have a cache a year ago. Much larger projects were looking for support there and it saved quite a bit. Actually as a result, caching is being asked on other related plugins now just for that reason. The default cache is in the target directory, so if you run 'clean', its deleted. Generally anytime configuration is changed, 'clean' should be run otherwise it uses stale cache. Great example, changing a pom file dependency. Without clean you will end up with duplicate libs.

hazendaz commented 3 years ago

That is to say saved cache....it was there but never worked correctly prior to last year.

ctubbsii commented 3 years ago

If this behaviour could be clearly documented, I think that would be an OK solution.

I'm open to suggestions, but I'm not sure what other documentation is needed to help users understand how to delete the cache that has been invalidated by a change to their Maven project. This might just be a thing that one needs to be pointed to if they didn't already make the connection themselves.

walles commented 3 years ago

I'm not sure what other documentation is needed to help users understand how to delete the cache that has been invalidated by a change to their Maven project

How to delete the cache is obvious and doesn't need any (more) documentation.

What's not obvious is that the cache has to be deleted manually in the first place.

And if that could also be clearly documented (or fixed in the code), that would be super!

ctubbsii commented 3 years ago

I suppose a naive attempt to solve this could be by having a special cache entry for the loaded config file... but so many other things matter... like the java compliance version, and the version of this plugin, and the version of any manually overridden Eclipse dependencies in the <plugin> section, and the operating system's newline scheme (which could change if the build is on a filesystem used by both Windows and Linux like a shared or network drive, for example).

While it seems like a nice idea to have such a feature to automatically detect when the cache is invalid, I just don't think it's feasible for for the plugin to try to attempt it... there are too many variables. I think, just because of practical reasons, it has to be left up to the users to be able to figure out when, if at all, the cache is going to be valid, based on the changes to the project that they know they made, rather than the plugin trying to detect and deduce what changes they made. I'm not even sure what we could document to hint to the user that a cache might no longer be valid.

If I were to make an attempt to document something, it would be something very generic, like "The cache can become invalid for any number of reasons that this plugin can't reasonably detect automatically. If you rely on the cache and make any changes to the project that could conceivably make the cache invalid, or if you notice that files aren't being reformatted when they should, just delete the cache and it will be rebuilt." I'm not sure how useful that would be, though. I'm open to suggestions.

walles commented 3 years ago

I suppose a naive attempt to solve this could be by having a special cache entry for the loaded config file

That would have saved me. I would like this!

ctubbsii commented 3 years ago

I suppose a naive attempt to solve this could be by having a special cache entry for the loaded config file

That would have saved me. I would like this!

This seems like a waste of time to me. I don't think we can, or should try to, automate away the the user's responsibility to understand the impact of their environmental changes on their build. It seems likely to increase complexity of this plugin, but not actually help users much. The user, not the plugin, is in the best position to understand the consequences of the changes they make to the build environment.

As I said, any number of environmental changes could cause the same issue, and it would be unreasonable for the plugin to try to detect them all. Just trying to detect all these changes could easily become a substantial percentage of this plugin's code, and I don't know why this specific one is any more special than any of the others... it's just the one you saw. It's not necessarily the one that the next user will see, or the next.

So, I won't spend time trying to chase these kinds of user experiences down one-by-one, but if somebody else wants to put together a pull request to address it, even though I'm not enthusiastic about it, I won't object, as long as it doesn't bloat the code too much.

walles commented 3 years ago

In that case I think the exact wording you suggested should go into the plugin documentation, that would be the next best thing:

The cache can become invalid for any number of reasons that this plugin can't reasonably detect automatically. If you rely on the cache and make any changes to the project that could conceivably make the cache invalid, or if you notice that files aren't being reformatted when they should, just delete the cache and it will be rebuilt.

famod commented 3 years ago

Coming from https://github.com/revelc/impsort-maven-plugin/issues/48.

As I said, any number of environmental changes could cause the same issue, and it would be unreasonable for the plugin to try to detect them all.

IMHO, this is too black and white. What plugins, frameworks etc. should do is to help users as good as they can. If there are corner cases which are too complex to cover: So be it! A small documentation hint is usually enough (e.g. "This plugin takes A, B, C into consideration when calculating the hashes. You might need to wipe the cache manually in case you changed X,Y, Z." or so).

The user, not the plugin, is in the best position to understand the consequences of the changes they make to the build environment.

This might be true for the user altering the config or updating the plugin. But once you pushed your change and it lands in the local workspaces of dozens of contributors/team members, it's mostly out of your hand. Sure, leaving the cache in target would solve this, but after more than a decade of Maven experience, I've seen most people run mvn clean ... all the time! I don't say it's good practise but it's just the way the cookie crumbles. Take Quarkus for example: 900+ modules. After people have pulled, they ususally do a mvn -Dquickly -Tx which runs also clean. (If that profile didn't include clean, most people would type it on their own.) So you end up re-running the plugin on thousands of java source files for basically nothing.

And at this point I have to ask: What happens more often? A cache-relevant config change or a clean? Definitely clean, AFAICS. So for many projects, leaving cache in target is almost as "bad" as having no cache at all.

Btw, Checkstyle is considering the config for calculating the hashes: https://github.com/checkstyle/checkstyle/blob/master/src/main/java/com/puppycrawl/tools/checkstyle/PropertyCacheFile.java I haven't checked whether they add the config cache to each file entry or whether they have a separate key for that.

PS: Having said that, it's good that there now is an explicit hint in the plugin doc that you might (will) have to wipe the cache on your own. When I moved the cache out of target in Quarkus in March this year, there wasn't such a hint yet.

ctubbsii commented 3 years ago

The problem is knowing which changes are cache-relevant. Some config changes might be, while others aren't. A version bump is unlikely to change the formatting in most cases. I think it's reasonable to defer to users to manage the lifecycle of their cache when they make changes to their build.

As I see it, there's basically two competing groups here:

  1. The users who make cache-relevant changes and want the plugin to automatically reformat; this group is likely to get upset/annoyed when it doesn't automatically reformat; this group is also less likely to even care about the cache
  2. The users who want high performance who want the plugin to avoid any unnecessary work to get as fast a build as possible; this group is likely to get upset/annoyed when the plugin slows down their build, even a little bit.

As somebody who always does mvn clean to ensure builds from scratch, for reproducibility, I would favor group 1 by including some additional data in the cache key (plugin version, config hash). However, group 2 seems to be the primary target audience for the cache. So, it seems better to favor them slightly and avoid being too aggressive when invalidating the cache.

For me, the safest thing to do seems to just stay out of it and leave it up to the user to be responsible for the cache lifecycle. If they want to leave it in the default location, then it's just a matter of rebuilding with mvn clean package instead of mvn package when they bump the plugin version or change the plugin config, and check in the changes to their SCM. If they store their cache outside of the target directory, they can configure that to be cleaned up at the clean phase with extra config or a profile, or they can manually clean it when they bump the plugin version or change the config. Leaving the decision to the user is the most flexible, and supports both groups of people. To clean or not to clean: user decide.

famod commented 3 years ago

Even the speed focussed folks should be able to agree to "correctness over speed". A reliable build process should never sacrifice correctness to speed.

TBH, I also don't see that there are really two strictly opposing groups. The primary reason to use this plugin is auto-formatting and/or validation (often both, auto-formatting locally and validation in CI). Secondarily to that everthing should be as fast as possible, where the cache comes into play and that cache is way less useful when left in target due to frequent cleans.

If they store their cache outside of the target directory, they can configure that to be cleaned up at the clean phase with extra config or a profile, or they can manually clean it when they bump the plugin version or change the config.

Btw, typically only a very small fraction of the users of a project has in-depth knowledge of build process. Most folks just want to have their stuff compiled, tested and packaged.

ctubbsii commented 3 years ago

Even the speed focussed folks should be able to agree to "correctness over speed". A reliable build process should never sacrifice correctness to speed.

Of course. The problem is that not every version bump or configuration change will result in any change to correctness, and many of them are just a waste. It's hard for this plugin to be able to determine which is the case. It's also possible that there is a correctness issue due to something else in the environment that has changed that this plugin has no way of detecting. I just don't see a lot of value for this plugin trying to maintain all those heuristics, when the user is in a much better position to know when they should clear their cache and when not do... much more so than the developers of this plugin.

  • a profile or another manual approach requires each contributor or team member to do that and there will always be people that don't receive that info which makes it impractical, IMO.

The only person who needs to know to do this would be the person who makes a cache-relevant change in the first place. If they check in their changes to their SCM, then everybody else will see those changes, and their local caches will automatically be invalidated because the file has changed. If a user doesn't know the consequences of making a config change or bumping the plugin version, then they shouldn't do it! if they do know the consequences, then it's easy for them to clear the cache as appropriate.

ctubbsii commented 3 years ago

In any case, all of this is philosophical. If somebody wants to do the work, then they can submit a PR to do it. If it's not so substantial that it creates a maintenance burden, then I wouldn't object to it. If nobody that thinks it is important wants to do the work, then the whole conversation is moot.

famod commented 3 years ago

Fair enough! I'll see if I can find some time to take a stab at this.

Btw, there could also be a property like aggressiveCacheEviction (true|false) to give users a choice.

hazendaz commented 2 years ago

@walles @famod I didn't reread this as it is long but thinking on the subject. It seems like if we just write the configuration used into the caching files it would then be simple enough to check if it was changed and invalidate that way without having to do much more. If someone can produce a PR along these lines that would help.

hazendaz commented 2 years ago

In fact, at some point I'm actually going to remove this cache entirely from this project and make it its own plugin. There are too many other plugins that would benefit from it that are incredibly slow and it doesn't make sense to re-invent the wheel but for now, I too now would like to see this for cache eviction so its basically feature complete when I make a break for it :) Context: I need the caching for license-maven-plugin as it is incredibly slow compared to all others currently but I know a lot in this are that look at physical files would benefit by skipping reads when not needed.

delanym commented 1 year ago

If I instruct the plugin to achieve a goal: format, and it doesn't get round to any formatting [INFO] Processed 49 files in 46ms (Formatted: 0, Unchanged: 49, Failed: 0, Readonly: 0) Can this be logged as a warning rather? Reason is my build log level is set to warning only and I don't see the message. If no change was made to the files, then why am I building? I should get a warning.

On the other hand, if files did change, lets say 10 requiring formatting and 5 that don't, then it can stay at INFO level. [INFO] Processed 64 files in 62ms (New: 5, Formatted: 10, Unchanged: 49, Failed: 0, Readonly: 0) So "New" indicates files the cache has never seen before. It makes sense that I'm building again and no warning is necessary.

delanym commented 1 year ago

The more I think about this the more irritated I get. If caching is such an old feature then why is there no parameter to invalidate the cache, at the very least? And 2 years into this ticket? Reliability > speed.

hazendaz commented 1 year ago

As with most plugins the output is critical to know it did something. Nearly all maven direct plugins are now all writing some info so that part won't change here. It's not a warning.

As far as invalidate cache. By default its in target folder. As such run clean lifecycle as most users due such as mvn clean install and the cache is deleted entirely.

Pull requests for any feature are always welcome to review. What I can say is sure lots of old tickets that are not critical to us to address. Personally I manage the builds for roughly 2k repos at my day job along with an extensive super pom and still some of the tickets here I even raised while important to me are not a significant priority. As such that is why tickets hang around in general in OSS. Its likely wanted just no one felt inclined to do much about it.

It could be worse, we could just close all old tickets with a stale bot. Its age doesn't quite matter.

Feel free to contribute if important enough for you. I'm sure others would appreciate it.

Sent from my Verizon, Samsung Galaxy smartphone Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Delany @.> Sent: Thursday, April 20, 2023 6:16:00 PM To: revelc/formatter-maven-plugin @.> Cc: Jeremy Landis @.>; Comment @.> Subject: Re: [revelc/formatter-maven-plugin] New formatter config must invalidate cache (#453)

The more I think about this the more irritated I get. If caching is such an old feature then why is there no parameterhttps://code.revelc.net/formatter-maven-plugin/format-mojo.html to invalidate the cache, at the very least? And 2 years into this ticket? Reliability > speed.

— Reply to this email directly, view it on GitHubhttps://github.com/revelc/formatter-maven-plugin/issues/453#issuecomment-1517011362, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAHODI6U23UES3ZMVH3DHZ3XCGYSBANCNFSM42JM2AAQ. You are receiving this because you commented.Message ID: @.***>

ctubbsii commented 1 year ago

The more I think about this the more irritated I get. If caching is such an old feature then why is there no parameter to invalidate the cache, at the very least? And 2 years into this ticket? Reliability > speed.

I would argue that this capability already exists. If you choose to keep the cache in the target directory, you can clear it with a mvn clean, and that is the mechanism we primarily support for invalidation. If you choose to store it elsewhere, then it will be invalidated at your discretion. I would also suggest that we cannot know all the possible circumstances that might cause the cache to be invalid, because we cannot know the totality of the user's environment. Perhaps they've edited their POM dependencies for the plugin to use a different Eclipse JDK, for example? How could we know they've done that? What if they have line endings set to auto, but copy their dev environment across a Windows and Linux environment? There are many many possibilities that could invalidate a cache. We cannot predict all of them. I think some responsibility is on the user to know about their environment, and understand that clearing the cache (using mvn clean or another mechanism) is an option for them to use, at their own discretion.

For your last comment, about Reliability > speed, I think some disagree with you. But, the good news is... if you believe that, then just always build with mvn clean package instead of mvn package. As the user, you have the choice to prefer speed or to prefer reliability by choosing to add clean to either start a build from the beginning or to do some kind of incremental build. We can't make this choice for you. It depends on your priorities, which may differ from somebody else's.

walles commented 1 year ago

Perhaps they've edited their POM dependencies for the plugin to use a different Eclipse JDK, for example? How could we know they've done that?

Unsure here, but would this really affect the formatting? If not, there's no need to invalidate the formatting cache.

What if they have line endings set to auto, but copy their dev environment across a Windows and Linux environment?

Wouldn't this bump the file timestamps so things get reformatted every time they do this anyway?

ctubbsii commented 1 year ago

Perhaps they've edited their POM dependencies for the plugin to use a different Eclipse JDK, for example? How could we know they've done that?

Unsure here, but would this really affect the formatting? If not, there's no need to invalidate the formatting cache.

That was a typo. It should have said Eclipse JDT. In any case, yes, it would. The formatting differs in each Eclipse JDT release.

What if they have line endings set to auto, but copy their dev environment across a Windows and Linux environment?

Wouldn't this bump the file timestamps so things get reformatted every time they do this anyway?

File timestamps can be preserved. You can move across environments by merely mounting an external hard drive or network storage device or by dual-booting. You don't need to change timestamps to change that environment.

In any case, for the basic situation that was initially described, a change was already implemented for this in #734, so this issue should have been closed already. There is a follow up issue at #757 that may also be relevant.