Cannot have working filename-based rules and avoid .editorconfig loading at the same time.

mgroth0 commented 8 months ago

Expected Behavior

To be able to process code snippets with a virtual file path with the following two behaviors:

To not check the filesystem for .editorconfig files
To have rules that depend on the file path such as standard:filename work correctly

Observed Behavior

If we use Code.fromSnippetWithPath, the standard: filename rule works, but editorconfig files are checked.
If we use Code.fromSnippet, .editorconfig files are ignored but rules such as standard:filename will not work.

Context

I lint kotlin files using the ktlint API. I want to use rules such as standard:filename that depend on the filepath, but I do not want the filesystem to be checked for .editorconfig files. I do use a .editorconfig file, but I handle the IO with it myself and provide it to the engine as a specially constructed EditorConfigDefaults.

Suggested Solution

RuleExecutionContext.createRuleExecutionContext is hardcoded such that Code.filePath is used both for providing the filePath to the psiFile for use in rules (PsiFileFactory.createFileFromText) and for loading .editorconfig files. (EditorConfigLoader.load). Somehow this should be refactored so that API users may provide file paths to the psi file and/or provide file paths for .editorconfig loading separately. They should not be required to use both features in an all-or-nothing way since they are not interdependent.

paul-dingemans commented 8 months ago

I lint kotlin files using the ktlint API. I want to use rules such as standard:filename that depend on the filepath, but I do not want the filesystem to be checked for .editorconfig files. I do use a .editorconfig file, but I handle the IO with it myself and provide it to the engine as a specially constructed EditorConfigDefaults.

Out of curisosity, why do you to handle the IO regarding .editorconfig yourself? One of the key features of Ktlint is that it does respect the .editorconfig when it does find it on the path. The .editorconfig standard offer more functionality than it seems to do at first glance. Implementing this yourselves might lead to deviations from that standard. Of course that is not problem when you are the only user of your API. But if you would expose this API to others, it can lead to confusion, and in the end to complaints about Ktlint not working correctly.

Instead of using EditorConfigDefaults it might be better to the EditorConfigOverride. The latter takes precedence on settings found in real .editorconfig file(s), while the EditorConfigDefaults only is used whenever a setting is not found in the real .editorconfig file(s).

mgroth0 commented 8 months ago

I am using ktlitn in a couple of different ways. One way is a in a gradle plugin. I am the only user, and I want to freeze my .editorconfig settings while I am running a gradle build.

I have custom ktlint gradle tasks. I use the KtLintRuleEngine API to check or autoformat files, and I collect LintError information to integrate into a custom report.

It is important that any configuration files in my project directory are "locked in" as soon as I start my gradle build. That is a guarantee I have with other types of configuration files that I load at gradle configuration time. My concern is that if ktlint is automatically loading .editorconfig files, and I save an edit to the file in the middle of the build, ktlint tasks that run subsequent to my edit might use the updated configuration. I need to ensure the new .editorconfig modifications do not take affect until the next build.

paul-dingemans commented 8 months ago

Thanks for explaining this.

When looking at the timespan of a build (a couple of minutes on your local machine), versus the frequency in which an .editorconfig file is modified, it feels to me like overengineering if Ktlint would facilitate this.

More important is that from a technical perspective this is unneeded. Ktlint only reads an .editorconfig file once, and then stores it into a cache. For bigger projects having a lot of files and/or lots of directories it is just to performace costly to read the .editorconfig for each file/directory being accessed. Ktlint won't clear the cache itself as long as the KtlintRuleEngine instance is kept alive. So as long as the API Consument is not instructing KtlintRuleEngine to clear the cache, this problem will not occur.

mgroth0 commented 8 months ago

For bigger projects having a lot of files and/or lots of directories it is just to performace costly to read the .editorconfig for each file/directory being accessed

Might I suggest API users be offered a minimalistic option to just disable .editorconfig loading altogether? It seems this would solve all of these issues, since an API user can already parse a .editorconfig manually and pass it to the engine in the form of EditorConfigDefaults and therefore don't technically need to rely on the library's built-in configuration file IO.

I still plan to try to get the ktlint repository working for me. I would be happy to create a pull request to show my idea.

mgroth0 commented 8 months ago

Please see my draft PR. It demonstrates my idea. If the idea is acceptable, it still needs refinement.

paul-dingemans commented 8 months ago

Might I suggest API users be offered a minimalistic option to just disable .editorconfig loading altogether? It seems this would solve all of these issues, since an API user can already parse a .editorconfig manually and pass it to the engine in the form of EditorConfigDefaults and therefore don't technically need to rely on the library's built-in configuration file IO.

I am not worried about complexity needed to implement a feature like. However, if I would implement it, I would take a different approach and just implement a flag in the KtlintRuleEngine instead of passing it via the Code Snippets.

The reason that I do not want to implement a functionality like this, is that there is no need for it for a larger group of users. Implementing functionalities that benefits a single user or a small group of users will add up to a lot of complexity in the long term. Once it is added to the public API, it has to be maintained. Usually the developer who has added the specific functionality is by then gone, or not using ktlint anymore. I have seen dozens of functionalities that ended up like that in Ktlint, and it has cost me a lot of time and lots of breaking changes to get rid of it.

So I am not saying that it is a bad idea, but merely that I do not see the value in it for the general user base of Ktlint. Next to that, you just seem to dismiss my remarks about why I think it is not needed in your use case:

More important is that from a technical perspective this is unneeded. Ktlint only reads an .editorconfig file once, and then stores it into a cache. For bigger projects having a lot of files and/or lots of directories it is just to performace costly to read the .editorconfig for each file/directory being accessed. Ktlint won't clear the cache itself as long as the KtlintRuleEngine instance is kept alive. So as long as the API Consument is not instructing KtlintRuleEngine to clear the cache, this problem will not occur.

mgroth0 commented 8 months ago

I see where you are coming from, but I am still thinking differently about this.

Next to that, you just seem to dismiss my remarks about why I think it is not needed in your use case

You're right that I dismissed this point you made, and I'm sorry. I thought that if I showed you how easily we might implement the feature you might not require further justification for it. I guess I had it backward; justification is the main thing you are looking for.

I see your point about the ktlint engine reading the .editorconfig file once and caching it. I can see how this mostly meets my needs, but it is just not a perfect fit.

I have a custom gradle task. I am devoted to developing my gradle plugin to do all of its work lazily as much as possible and to adhering to best practices. So, adhering to idiomatic gradle here means that the .editorconfig file should either be a configuration input or a task input. With configuration caching enabled, all configuration and task inputs must be calculated prior to any tasks executing. Since Gradle instruments IO, it could automatically detect ktlint loading .editorconfig files. But this requires that I am constructing and using the ktlint engine at configuration time, which is bad practice. Especially since the ktlint engine is such a heavy library that uses the embedded Kotlin compiler, it is undesirable for this to be polluting the configuration phase.

So what I do is that I load the .editorconfig file at configuration time manually. I had to jump through hoops and learn how to parse the file and manually convert it so that I could store it and later put it into EditorConfigDefaults. It seems to work perfectly except for one issue: the ktlint engine is still trying to find and load .editorconfig files itself. I can completely avoid this if I use Code.fromSnippet, but then ktlint does not receive a virtual file path, and rules like standard: filename will not work. I want to give the ktlint engine the true virtual path for files so that rules have accurate information, but I do not want ktlint to try to load the .editorconfig file again. If it does, it might override the version of the file that I loaded manually at configuration time.

Now you might be thinking that this is all just a very oddly specific issue that only I will ever have. However, I will make the case that my specific issue is a symptom of a subtle design flaw in ktlint as it currently stands. A software component should be composable. That is, it should play nice with other software components. And one of the ways a component plays nice is by allowing the components that use it to be fully responsible for supplying configuration options. The way that ktlint is restricted to loading configuration from the disk is friendly to plain users but not so friendly to API users who want to use ktlint as a subcomponent. It is unusual for a library to supply itself with configuration through the filesystem. Or at least, in all cases that I have seen such a behavior can be disabled or replaced with input from the caller.

I can't prove that other API users besides me would appreciate this, but it seems intuitive to me that a library that wants to be used as a sub-component would benefit from being directly configurable at runtime and not through any disk IO. The current system makes sense if ktlint is purely being used as a user-facing application or plugin. But if ktlint wants to be used as a flexible subcomponent in other tools, I think this is the right direction to go in. This is why in my PR I went further than a simple flag and demonstrated an ability to pass a raw .editorconfig string to the engine, as this would be the ideal in my opinion. A simple flag to disable IO is sufficient in that we can pass configuration to EditorConfigDefaults, but that currently just felt a bit more hacky. But in principle it would be completely fine.

At the end of the day, I do not want to burden you and will respect your decision. You are maintaining a library for thousands of people and realistically this all might not be worth the effort. I'm guessing you are the main maintainer in which case I really appreciate all that you have done to make this tool. I enjoyed the discussion and I'm happy to have this closed if you do not plan to implement this, as I cannot commit to maintaining anything myself.

paul-dingemans commented 8 months ago

Thanks a lot for elaborating on this.

I am not very familiar with the Gradle eco system. But your explanation did rang a bell.

I have a custom gradle task. I am devoted to developing my gradle plugin to do all of its work lazily as much as possible and to adhering to best practices. So, adhering to idiomatic gradle here means that the .editorconfig file should either be a configuration input or a task input. With configuration caching enabled, all configuration and task inputs must be calculated prior to any tasks executing.

In https://github.com/pinterest/ktlint/issues/1446 you can read more about a similar use case. This was resolved by adding the method below to the KtlintRuleEngine:

    /**
     * Get the list of files which will be accessed by KtLint when linting or formatting the given file or directory.
     * The API consumer can use this list to observe changes in '.editorconfig' files. Whenever such a change is
     * observed, the API consumer should call [reloadEditorConfigFile].
     * To avoid unnecessary access to the file system, it is best to call this method only once for the root of the
     * project which is to be [lint] or [format].
     */
    public fun editorConfigFilePaths(path: Path): List<Path> = EditorConfigFinder(editorConfigLoaderEc4j).findEditorConfigs(path)

I might still not be exactly what you want, but it seems to work for the ktlint Gradle plugin.

mgroth0 commented 8 months ago

Yes, that is helpful, thanks. It's not ideal and I can't call this fully resolved, but it does help us have a relatively cleaner workaround.

Edit: Actually, its not as helpful as I thought. It requires constructing the ktlint engine which I would like to avoid doing at gradle configuration time. And it cannot prevent the engine from reloading the editorconfig files which might be edited during the build. It could help with input tracking on the gradle side, but in my opinion, we need a way to pass .editorconfig as arguments instead of disk IO, or at least an off switch for disabling all editorconfig disk IO.

paul-dingemans commented 7 months ago

Edit: Actually, its not as helpful as I thought. It requires constructing the ktlint engine which I would like to avoid doing at gradle configuration time. And it cannot prevent the engine from reloading the editorconfig files which might be edited during the build. It could help with input tracking on the gradle side, but in my opinion, we need a way to pass .editorconfig as arguments instead of disk IO, or at least an off switch for disabling all editorconfig disk IO.

I can understand this. But still I feel the same objection, I had before:

The reason that I do not want to implement a functionality like this, is that there is no need for it for a larger group of users. Implementing functionalities that benefits a single user or a small group of users will add up to a lot of complexity in the long term. Once it is added to the public API, it has to be maintained. Usually the developer who has added the specific functionality is by then gone, or not using ktlint anymore. I have seen dozens of functionalities that ended up like that in Ktlint, and it has cost me a lot of time and lots of breaking changes to get rid of it.

So I am not saying that it is a bad idea, but merely that I do not see the value in it for the general user base of Ktlint.

I am closing the issue for now. I am willing to reopen it when it would be beneficial and requested by at least one or two other integrators like the ktlint gradle-plugin, or detekt.

pinterest / ktlint