jfmengels commented 3 weeks ago

Recently, @jxxcarlson and @mateusfpleite have been working quite a bit with codemods (scripts whose primary purpose is to change a codebase with consistent changes), with jxxcarlson/elm-review-codeinstaller and jxxcarlson/codeinstaller-rulesets, and @lue-bird has published similar work with lue-bird/elm-review-upgrade.

It seems people want codemods, and elm-review is currently the best tool for that.

Both tools are configured through an elm-review configuration that looks like this:

-- jxxcarlson/codeinstaller-rulesets
RuleSet.Add.pages [ ("Quotes", "quotes"), ("Jokes", "jokes") ]

-- lue-bird/elm-review-upgrade
Upgrade.rule
  [ Upgrade.reference { old = ( "Fuzz", "tuple" ), new = ( "Fuzz", "pair" ) }
  ]

Some of these codemods are meant to be for prepared scripts, such as upgrading packages (elm-explorations/test v1 to v2), or adding new functionality in an existing codebase (https://discourse.elm-lang.org/t/install-new-features-in-old-apps-with-elm-review/9898). To make this functionality available, both tools have had to implement smaller tools to make changes to a codebase: renaming functions, introducing a new function, replacing code by new code, etc.

These primitives could also be really useful to users to realize massive automated changes to their codebase, without writing a new elm-review rule from scratch.

But it feels a bit tedious. Say you want to replace a reference from A.b to C.d:

cd to your elm-review configuration, and install your favorite codemod package through elm install
Add the code (ex: Upgrade.reference { old = ( "A", "b" ), new = ( "C", "d" ) })
Run elm-review in fix mode
Remove the changes from step 1 or 2, as you likely don't want to commit these changes to your project.

elm-review --template is a useful feature of the tool to run preconfigured rules directly from GitHub on your codebase in a one-off manner, skipping all the steps above, except step 2. The problem there is that you can't edit the configuration for that rule.

Another solution would be to create a temporary configuration in a location like scripts/:

elm-review init --template some/template --config scripts/
Add the code (ex: Upgrade.reference { old = ( "A", "b" ), new = ( "C", "d" ) })
elm-review --fix --config scripts/

Now you don't have to undo your changes (especially if you create in a temporary folder), but it doesn't differ much in practice from previously.

The main problem is step 2: changing the configuration. Solving that would remove the need to create a configuration and cleaning it up, allowing everything to be a single command-line instruction. Something like:

elm-review --input '{ "Upgrade.reference": [ { "old": "A.b", "new": "C.d" } ] }' --rules Upgrade.reference

The idea would be that this would (re)configure a rule (Upgrade.reference in this instance) to do something than by default.

Here's a proposal to make this work. Spoiler: I am currently unsatisfied with it, and I don't think it should be implemented, but I'll let you be the judge of it, or suggest improvements that could make it worthwhile.

One thing that I think would be quite useful for this kind of project is a way to pass in arbitrary data to the rule. Something like this:

elm-review --input '{ "module": "Foo", "addType": "A = B | C" }'

(Or another format, doesn't have to be JSON)

This would be a new feature for elm-review, one that I think could be interesting for use cases like this (though it would likely be subpar for caching, maybe?).

Please read this and provide feedback. If you think this feature is a bad idea, could even be harmful to the project/ecosystem (which I could see), or can be replaced by some script, that is valuable feedback as well.

Use cases

Let's talk about use-cases first.

Codemods

The previous section talked about codemods extensively, so I won't get further into it.

Insight scripts

As shown in # Gaining insight into your codebase with elm-review, using elm-review we can gather a lot of information using some custom code. An example of this is generating a graph of all imports, or all values, or pretty much anything else at the granularity that you want.

But if you want to understand a portion of your project, for instance making a graph of where a specific value is referenced (directly or indirectly), then you have to configure your rule to specify which value you want, meaning going into the review configuration and changing values.

For example, if you're trying to figure out the implications of changing a module or function, you might want to know which modules it is used in. You could generate a graph that only includes the files that directly or indirectly reference the module (as a text file, as an image, etc.). That would be a lot more helpful than crawling through the graph of every module (at which point your IDE and a notepad is probably better suited to your exploration).

All of this is already technically possible, but there's this pain of setting up a configuration, which I'd like to improve. Being able to run a script through --template but tailored to your problem would make it much nicer.

IDE actions

I'm not sure yet, but potentially IDEs could provide a UI interface to trigger arbitrary elm-review rules (say a Rename rule, if they didn't support that already out of the box). I am not sure if this feature would enable that, but I think it could at least pave most of the way.

Maybe it wouldn't work well as we'd ideally get data on the fly, rather than as additional configuration to the rule, but it's an idea.

I also don't know or without this should actually be handled by elm-review, or by a similar tool that allows to trigger code transformation tools/scripts written in Elm (inspired by or wrapping elm-review), allowing anyone to write complex IDE snippets in Elm.

Add missing information

This would also allow a few rules to add optional missing information. For instance, in the last release I planned for elm-review-documentation to add a rule checking for and adding a new CHANGELOG entry when cutting a new release. A problem I've realized since then, is that to autofix this issue, I need to insert a line like ## [1.2.3] - 2024-01-01. The problem is that the current date is not a piece of information elm-review has access to. The rule can still be useful, but we probably don't want to apply the autofix without it (at least if we want the date).

Preparing a script like date = date +%Y-%m-%d; elm-review --fix-all-without-prompt --input '{ "currentTime": "$date" }' would be helpful.

Although, this is something we probably would all like to have to do as little as possible.

Target requirements

To be a good solution, I aim for a few target requirements:

1. You should be able to configure multiple rules at the same time

Example: you should be able to write input that allows replacing references from X to Y and inserting a new function Y, that's potentially two separate rules.

2. Each rule should be able to read configuration meant only for itself

From the previous example, a reference replacement rule should read the data that it needs to replace X to Y.

3. Rules should be able to read shared configuration

Some information is specific to a rule, but some could be shared, as one rule could use some of the same information as another rule. In an elm-review configuration, this could be a top-level variable passed as an argument to two different rules.

4. Rules should be able to trigger configuration errors if the input seems incorrect.

5. Rules should be enabled on demand

A lot of codemod or insight rules have no business being enabled by default if they are not meant to be run. In most cases, you probably don't want to have Upgrade.rule enabled if you're not working on an upgrade job.

6. Input should be as short and succinct as possible

If it's more painful to write the command-line input than it is to go edit your ReviewConfig.elm file, then there's little point to this feature.

The solution

Ok, let's get into solving the problem. Let's start with some initial naive solutions, which we can improve afterwards:

For the lack of a better name, I'll go with --input as the name for the flag for now.

In the rule definition, a rule would opt-in to read from the input. Naively, something like a visitor that we're quite used to:

Rule.newProjectRuleSchema "Rename" initialProjectContext
    |> Rule.withInput (\string context -> { context | input = parseString string })
 -- |> more visitors...
    |> Rule.fromProjectRuleSchema

Format of the extra input

A lot of the constraints mentioned above relate to the contents of the input (1, 2 and 3), so what should be the format of this data?

Let's take the example of a Rename rule that renames a function. Naively, I would want to do something like this:

elm-review --input "oldName->newName"

One problem here is that it is not clear for which rule this input is meant for. If multiple rules try to read the input, and one of them doesn't understand this, it should likely create a configuration error.

So we need to find a way to specify the target for a piece of the input.

Targeted inputs

A first option is to specify the name of the rule ahead of the input.

elm-review --input "Rename:oldName->newName" --input "OtherRule:somedata"

elm-review would then strip the rule name from the input and dispatch the rest of the string to the rule.

This solves point 1 and 2, but for point 3 (rules should be able access shared data), I don't think this works super well. We could have a

elm-review \
    --input "Rename:oldName->newName" \
    --input "OtherRule:somedata" \
    --input "shared:someMoreData"

but then we hit the same problem that even among the shared data, some rules might expect to see some data there but not find it.

So maybe the key should not be the name of the rule, but some keys that the rule wants to read from:

Rule.newProjectRuleSchema "Rename" initialProjectContext
    |> Rule.withInput
        (InputReader.read
            (\stringForRename stringForShared context -> ...)
            |> InputReader.key "Rename"
            |> InputReader.key "shared"
        )
 -- |> more visitors...
    |> Rule.fromProjectRuleSchema

JSON format

The solution I thought about the longest was to have the input be JSON:

elm-review --input '{ "Rename": { "from": "oldName", "to": "newName" }, "OtherRule": "someData", "data-for-rule-X-and-Y": "someMoreData" }'

I think this is somewhat the most flexible. A plus of this is that a rule just has to write a JSON decoder for this to work: It grabs the data that it needs, and ignores the rest, without elm-review needing to dispatch anything. It is natively quite composable.

Given sufficiently complex input, the input would likely end up being JSON in the previous solution anyway, so why not make it JSON to start with?

JSON decoders are sometimes perceived as difficult to work with, but it's likely easier than another decoder-like API proposed before (InputReader) where you likely have to do some parsing or string manipulation to understand the input.

Another plus-side of a JSON format, is that you could put the contents in a separate file, and pass the contents to --input: elm-review --input '$(cat input.json)'.

A big drawback is that it is however very verbose, which goes counter to point 6. All the quotes and curly braces makes it hard to type correctly on the first attempt (which makes me want to put it into a separate file...).

Another drawback, is that JSON doesn't have to respect this structure I've mentioned above. You could create a rule that understands the following input just fine.

elm-review --input '"oldName->newName"'

but this rule would then not play well with other rules that expect JSON. But that is not something elm-review could enforce very well.

Configuration pain-points

As I hinted at earlier, this extra input is in practice additional configuration to the rule, through a different format.

The first pain-point is that the configuration is basically (potentially) unstructured data: it's data that we have to parse (and validate) into Elm data structures that the rule can use. For a lot of the current configuration, we can let the Elm compiler report errors if they're of the wrong type. For a dynamic input field, the rule author would have to do those checks themselves.

A lot of rules use a builder pattern for its configuration, and this would be an additional configuration method on top of that, that the rule author has to merge.

It also creates some choices to be resolved. Say the Rename rule has been enabled, and configured to rename something already. If you now run elm-review with --input, does that override the existing configuration, or add to it?

You could argue for both, and I think it would be valuable to know what to expect for most rules without having to look at the documentation of the rule, or to look in that documentation to see if you need to add an "override: true|false" field somewhere in the input.

I don't have a great solution in mind for this at the moment, this would likely have to be a convention, but not one that I can help enforce except through good documentation (but that's understandably not always read as much as it should).

Reporting configuration errors

At the moment, there are two ways for rules to report "general errors": global and configuration errors.

I have a dedicated blog post about them, but to summarize:

Configuration errors can be triggered at the definition of the rule, and prevents rules from running
Global errors can be triggered anywhere during the analysis (meaning analysis has already started)

A configuration error looks somewhat like this:

```elm
rule : Int -> Rule
rule threshold =
  if threshold >= 1 then
    Rule.newModuleRuleSchema "NoNestedCaseExpressions" initialContext
      |> Rule.withSimpleExpressionVisitor (expressionVisitor threshold)
      |> Rule.fromModuleRuleSchema

  else
    Rule.configurationError "NoNestedCaseExpressions"
      { message = "The threshold needs to be strictly positive"
      , details = [ "..." ]
      }

A configuration error is created by creating a Rule (it has type Rule). Values that have been validated generally don't make it to the Context of the rule (the type containing all the accumulated data), but are passed to the visitors directly (though both options are possible).

If we take the naive implementation using visitor I used earlier:

Rule.newProjectRuleSchema "Rename" initialProjectContext
    |> Rule.withInput (\string context -> { context | input = parseString string })
 -- |> more visitors...
    |> Rule.fromProjectRuleSchema

then you might notice that we are already working with a rule and some context. It is too late to create a configuration error, and to stop the rule from running. And if the rule is already running, that means you need to do a lot of defensive programming in every visitor if context.isConfiguredAppropriately then ... else doNothing.

Additionally, having the input data only in the context also means that you have to merge it manually with the configuration values passed to the visitors. In practice, you would likely reference them in the withInput function above, and store all this configuration in the context instead of passing them as arguments, which is not as nice in practice.

In practice, we likely want an andThen-able API, something like this:

-- Review.Rule.elm
requestDataFromInput : (String -> Rule) -> Rule

-- Rename.elm
rule : Rule  
rule =  
    Rule.requestDataFromInput
        (\input ->
            case Json.Decoder.decodeString someDecoder input of
                Ok { from, to } ->
                    Rule.newProjectRuleSchema "Rename" initialProjectContext
                        |> Rule.withModuleVisitor (moduleVisitor moduleName from to)
                        -- ...
                        |> Rule.withFinalProjectEvaluation finalEvaluationForProject
                        |> Rule.fromProjectRuleSchema

                Err error ->
                    -- Either return a configuration error or
                    -- a noop rule, I'm not sure yet.
                    Rule.configurationError "Rename"
                        { message = "Invalid configuration"
                        , details = [ "..." ++ error ]
                        }

        )

While this would make configuration errors available where we want them, and would work with regular rule arguments (still with the problem of which ones takes priority), I don't think this is technically possible in the current API, without a breaking change (or really complicated).

In essence, this makes it so that rules are functions of the shape Input -> Rule instead of just Rule (or something really weird under the hood that I can't yet envision). Maybe it's possible but I'm unsure how without a breaking change.

Enable rules on demand

Another problem is that of enabling rules on demand. As mentioned earlier:

A lot of codemod or insight rules have no business being enabled by default if they are not meant to be run. In most cases, you probably don't want to have Upgrade.rule enabled if you're not working on an upgrade job.

All elm-review rules in your ReviewConfig are enabled (and if you don't want some, you remove them), there is no concept of enabled or disabled rules apart from just their presence. Our configuration system is that simple, but that doesn't work super well here.

Say you want to temporarily use the Rename rule. Either you use it from a remote template GitHub using --template, or you need to enable it by adding it to your ReviewConfig. Another option is also to have a separate configuration folder for these kinds of scripts and use it using --config.

The closest thing I can think of, is to use enable Rule.requestDataFromInput from above, and have the rule disable by default, and enabled using valid input.

-- Review.Rule.elm
requestDataFromInput : (String -> Rule) -> Rule

-- Rename.elm
rule : Rule  
rule =  
    Rule.requestDataFromInput
        (\input ->
            case Json.Decoder.decodeString someDecoder input of
                Ok (Just { from, to }) ->
                    Rule.newProjectRuleSchema "Rename" initialProjectContext
                        |> Rule.withModuleVisitor (moduleVisitor moduleName from to)
                        -- ...
                        |> Rule.withFinalProjectEvaluation finalEvaluationForProject
                        |> Rule.fromProjectRuleSchema

                Ok Nothing ->
                    -- There is input but nothing for "Rename"
                    -- so we return a dummy noop rule
                    someDummyNoopRule

                Err error ->
                    -- "Rename" is in input, but could not be parsed
                    Rule.configurationError "Rename"
                        { message = "Invalid configuration"
                        , details = [ "..." ++ error ]
                        }

        )

The parsing of the input starts to be a bit complex here. The line between what is a configuration error and what is no configuration is starting to be thin. For instance, what if I mistype the name of the rule to Rneame. The rule will likely be disabled when I actually would prefer an explicit configuration error.

In practice, the rule would probably allow for either/or input and regular configuration, because you don't want to have to do everything through --input when it could also be done through ReviewConfig.

This design above is the best I have come up so far. Maybe we could make it somewhat simpler through some convenience helpers, but we'd need some more experience with it.

Separately, we could also have a way to inject a rule in the configuration that is not already there (probably on demand also). That idea quickly goes down the drain because elm-review does not know how to reliably create a rule just from its name (it can be a simple constant or a function, it can have complicated-to-build arguments, its module name can be different than the rule name, etc.).

Read input from a file

I mentioned earlier that if we use JSON, then we could read from a file, which is useful if the contents are large.

If we go that way, then maybe we would want to make it possible so that elm-review reads from that file directly without having to use bash or a script.

In that case, that opens up the question of whether we should watch the contents of that file if we run in watch mode. Probably?

The line between extra input and extra files starts to be a bit blurry too. They're different but there's definitely some possible overlap too where one could choose to implement a rule using one or the other.

We already have the extra files visitor from the last update though, so maybe that should be used instead, which would allow for watching the contents of the file as well though --watch.

This is a separate feature that we could add later, but it's one that could be requested afterwards. Though, maybe this feature is sufficiently used for "one-off" purposes that this would never be requested.

We could even have the rule take a JSON decoder Json.Decode.Decoder Rule and we create an error from that. Although it would likely yield much feedback if we let the rule author craft their own error message.

Alternative option: Leave it out of elm-review

As you probably noticed, I have some reservations about the feasibility, the design and the consequences of some of these designs.

Let's explicit that this feature does not solve a major annoyance, not enables new things that can't be done already, it just makes some things easier (or at least it aims to).

Considering the tradeoffs of the designs I could imagine so far, I think it's better to NOT do this feature, until we can figure out something better.

So if we don't support this feature inside of elm-review, what can we do or imagine instead?

Well, the first solution, for things like Rename scripts, is to have users edit configurations manually. A bit annoying, especially if you have to initialize a new configuration folder, but it's not that bad considering the potential value.

Another solution is to have a wrapper tool around elm-review. One could imagine a new CLI, that under the hood generates an elm-review configuration and execute it in fix mode.

elm-refactor-tool rename Some.Module foo bar

-- ReviewConfig.elm

-- Generated configuration
config =
    [ Rename.rule
        [ { moduleName = [ "Some", "Module" ]
          , from = "foo"
          , to = "bar"
          }
        ]
    ]

And if the arguments are invalid, then there will be an error somewhere in the chain (either by the script itself, by the Elm compiler, or through a configuration error by the rule itself). If that tool so desires, it could at some point decide to use faster tools than elm-review under the hood.

I quite like this solution as it could probably figure out the best UX design for each task. For instance, the rename use-case above for instance is really short, though that's maybe a naive design, depending on the goals of the tool.

This tool would likely be either dedicated to something - like refactoring - and/or support custom scripts. Power that with elm-codegen and elm-pages scripts, and this could be a pretty nice power tool I believe.

And then maybe this could power the IDE as well.

As for the CHANGELOG automatic fixes, maybe that should be a separate, non-elm-review check anyway 🤷

Conclusion

I have been thinking about this feature for a few years now, and it seemed pretty interesting without thinking about it deeply. Now that I've looked into it further, I think it's probably not a great idea, but the idea of a dedicated codemod tool is alluring (and would likely push forward elm-review's autofixing features too).

Here's a summary of the problems:

I can't decide on a good format for the --input
- Targeted --input RuleName:data leads to parsing odd data
- JSON leads to potential selfish rules that don't expect JSON and don't play nice with other rules. And also is extremely verbose and hard to craft
- Any data passed with --input is subpar compared to passing data through Elm constructs (builder pattern, arguments)
Supporting rules that create configuration errors when badly configured with the input flag is likely a breaking change
It's unclear what to do when adding input to rules that are already configured (not a large problem, but could lead to inconsistency across rules)
It's unclear what some dormant rules should do if they await input to but a typo has been made
This doesn't help with enabling rules that are not already enabled (likely the case for codemod/insight rules)

Compare that to the current solution

cd to your elm-review configuration, and install your favorite codemod package through elm install
Add the code (ex: Upgrade.reference { old = ( "A", "b" ), new = ( "C", "d" ) })
Run elm-review in fix mode
Remove the changes from step 1 or 2, as you likely don't want to commit these changes to your project.

and it feels like it's not necessarily that big of a deal.

If you come up with good ideas, then maybe we could still figure out a way to make this work, and that would be great!

PS: Thank you for reading. If we don't come up with a good design, then I might turn publish this as a blog post as a failed feature design idea. Let me know if you think this is interesting.

jakub-nlx commented 3 weeks ago

What about this idea:

Have a separate sub command like elm-review exec --package some-user/some-package 'RuleName.rule { some = "arg" }' 'SomeOther.rule []' .

The arguments are snippets of Elm code that would go into the list of ReviewConfig.config.

The tool works by generating a separate elm-review configuration using elm install to get all of the specified packages and generating the config by parsing out the modules (should be fairly trivial with elm-syntax), then simply populating a template like this:

module ReviewConfig exposing (config)

import Review.Rule exposing (Rule)
{% for import in imports %}
import {%= import %}
{% end %}

config : List Rule 
config =
   [
   {% for snippet in snippets %}
   {%= snipper %},
   {% end %}
   ]

or some such, then just runs elm-review using the generated config. Any other flags can be forwarded (like --watch or --fix).

I think this solves problems 1,2, 4, 5 quite neatly I believe, while not being too shabby about 6 (and also having the benefit that the format is just Elm code which is already familiar to anyone using the tool).

Finally it achieves all of these without requiring any support whatsoever from rule authors, which again makes it much less work on authors to create high quality rules.

Specifically, for instance we defer to the elm compiler for value validation, which is quite neat.

I don't really understand why exactly requirement 3 is so important, since it seems fairly trivially solvable with a tiny bit of shell scripting. Like I could easily do something like `LONG_DATA=$(cat some-input.txt); elm-review exec "Rule1.rule \"$LONG_DATA\"" "Rule2.rule \"$LONG_DATA\"", which isn't the prettiest, but seems not too onerous and fairly straightforward to figure out 🤷

The other nice thing it allows is to use existing elm-review APIs to further customize the rules. For instance I can easily configure the rule ad-hoc to ignore errors in a code-gen folder by the fairly familiar:

elm-review exec "Rule1.someRule |> Review.Rule.ignoreErrorsForDirectories [ "codegen" ]"

jfmengels commented 3 weeks ago

Yeah, I think something like that could work too, especially if the command can understand code quite well (figure out used module aliases from the inserted code, etc.).

It probably requires quite a bit of refinement so if this is a direction we go for, then I think I'd like to see an external proof of concept first before deciding to make it into the tool itself.

I think this would also likely call for a similar init functionality, where instead of executing it, we initialize it. That could help with debugging if you're not getting what you want for instance, or just to continue the work without starting from scratch.

I don't know that it is, but it could be quite nice. I have for instance in mind two CSS rules a work (and in a package that I should go back to at some point) that both need information about CSS files. That is now solved through "extra files", but a similar could happen. As to the "it can be solved through scripting", I agree, but scripts are usually non-portable (Windows vs Unix) and require a bit more knowledge (I have to Google a lot when combining multiple commands even after years, go figure).

jfmengels / elm-review

Add support for extra input from elm-review CLI #176

Use cases

Codemods

Insight scripts

IDE actions

Add missing information

Target requirements

The solution

Format of the extra input

Targeted inputs

JSON format

Configuration pain-points

Reporting configuration errors

Enable rules on demand

Read input from a file

Alternative option: Leave it out of elm-review

Conclusion