Make ` (ns formatting-stack.kondo-classpath-cache)` optionally opt-out

vemv commented 3 years ago

Problem statement

https://github.com/nedap/formatting-stack/blob/28cc3b933012077c29f00f9bec4e153685281a78/worker/formatting_stack/kondo_classpath_cache.clj#L19-L25 , while generally OK, implies a problem, namely that for large projects it forces a Xmx to be set. This is not problematic per se, but under certain circumstances Xmx can be deemed undesirable to be set or increased.

This is particularly true when f-s can be used simply as a library; formatting-stack.kondo-classpath-cache can sometimes be an unintended, not-used side-effect of requireing f-s.

Proposal

Add a system property, opt-in, if set, wraps the work in delay rather than future.

This way, consuming code remains compatible (as it uses @), while we avoid the side-effect.

Worth noting, the default existing behavior would remain untouched.

Alternatives and comparison

Unconditionally change future to delay
- would be a performance regression, given that future causes the work to be done in advance (as it often there's a time lapse between repl startup and the moment where a programmer/script decides to invoke linting).

thumbnail commented 3 years ago

not-used side-effect of requiring f-s.

I'm curious in what setup the side-effect would be undesired. Could you elaborate? If you could (somehow) prevent loading formatting-stack.linters.kondo, the cache wouldn't be created.

Previous work on this can be found here. It's almost verbatim the mentioned alternative. The "problem" that PR targeted was perceived slowness, which was probably related to stop-the-world garbage collection since the JVM was running out of memory. In that case I worked around the problem by lazy-loading formatting-stack.

Adding another system property might sound cheap, but it's yet another custom configuration option that needs to be supported in the future. It could hinder work on a proper configuration system (which ideally, would be side-effect free). So i'm cautious around that solution

vemv commented 3 years ago

I'm curious in what setup the side-effect would be undesired. Could you elaborate?

It creates (and holds indefinitely in memory) a multi-GB object that may not be used at all.

If you could (somehow) prevent loading formatting-stack.linters.kondo, the cache wouldn't be created.

That's true but also an implicit way of accomplishing something that can be deemed too important to be implicit.

Previous work on this can be found here. It's almost verbatim the mentioned alternative.

Yes, and the tradeoff remains the same. Ultimately consumers should be able to choose the memory and performance characteristics of their environment.

It could hinder work on a proper configuration system (which ideally, would be side-effect free).

Hinder is bit of an overstatement; system props and config are fairly orthogonal. Also, you might be able to foresee how a config-system and require/future/delay interaction could be quite extremely delicate: requires happen before anything else (like running lint!).

So, what you defend surely would involve performing requires dynamically; that can easily go wrong in some of other way (e.g. Eastwood and clj-refactor tend to perform requires as part of their classpath-wide analysis; so we could end up requireing code that thought that we would not require).

tldr, a system prop that is inlined right in the "culprit" LOC is unmistakable; assessing it can be done in isolation. Coupling requires to the config system needs more fine-grained reasoning, and also in practice would mean a factual need (https://github.com/nedap/formatting-stack/issues/175) is deferred indefinitely since we don't yet have the refactored config system.

thumbnail commented 3 years ago

It creates (and holds indefinitely in memory) a multi-GB object that may not be used at all.

I'm not really familiar with the internals of clj-kondo, but as as i understand it the run! only writes analysis to the clj-kondo cache. The clj-kondo docs suggest running the analysis for the cp before the first lint! (in two separate processes). I wonder where that /multi-GB object/ is used for at all.

Hinder is bit of an overstatement; system props and config are fairly orthogonal.

Having system-props to tweak the behaviour (and thus configuration) of a system need to be taken into account when the configuration will be redesigned. Making this internal an explicit option now, makes it public api. Requiring careful work to maintain compatibility in the future.

Another possible workaround is creating the cache manually before, since the second time clj-kondo passes over the same classpath it skips everything in the cache. Possibly mitigating the issue.

vemv commented 3 years ago

I wonder where that /multi-GB object/ is used for at all.

can be queried but I don't think it's unused, since I implemented it quite carefully after a chat with the author. I've never witnessed a false positive/negative, or slowness that can be attributed to an insufficient cache.

Making this internal an explicit option now, makes it public api. Requiring careful work to maintain compatibility in the future.

This is a fair point, although an unpublicized java prop isn't necessarily an API. That was my reasoning with e.g. the parallelize-linters option (which is not really mentioned anywhere user-facing).

The prop might be prefixed with private and eventually removed.

Another possible workaround is creating the cache manually before, since the second time clj-kondo passes over the same classpath it skips everything in the cache. Possibly mitigating the issue.

I also thought of this and it seems promising. I think it boils down to enabling running f-s as a main that only populates this cache, and also allowing f-s to be required without loading the cache.

I think it would do essentially the same as described in my OP though; essentially one is pursuing a conditional.

(note that installing clj-kondo as a binary in CI is bit of an invasive option, for what is essentially a workaround)

nedap / formatting-stack