Interact with arbitrary files

henriquecbuss commented 2 years ago

It would be nice if we could interact with files other than .elm files.

Example use case

Many apps get their translation texts from JSON files. If we could interact with these files, we could check for some things:

All json files have the same keys
All keys available in the JSON files are used in the Elm app
All keys used in the Elm app are valid keys in the JSON files

For 1. and 2., errors are related to the JSON files, so we could either report errors with globalError, or have some variation like errorForFile and errorForFileWithFix (I'm not sure if reporting errors or offering fixes for arbitrary files is in scope for elm-review).

Item 3. only needs to have the content of the translation files in the context. Other than that, it would look like a regular rule (just check if the key string is a valid entry in the context).

I think the main point of discussion is how to get information from these files into a rule. I really liked the additional visitor idea:

withExtraFiles : (List { path : String, content : String } -> projectContext -> ( List Error, projectContext ) ) -> ProjectRuleSchema -> ProjectRuleSchema

This visitor allows any file format (not only JSON) to be visited, so it's a 👍 from me!

jfmengels commented 2 years ago

This is a very interesting proposal because I think it would allow elm-review to get even more information than what it currently does (and more information: less false positives and negatives, and more potential rules).

I think that is this system works out well, it could replace the README visitors which will end up being a special case for this kind of visitor.

Requesting data

As you say and as discussed outside of the thread, it's quite easy (with the withExtraFiles visitor) for the rule to access the data, but I don't yet know exactly how the rule would be able to tell the tool which files it want.

An API I'm thinking of could be as follows:

-- Definition
withExtraFiles :
    (List { path : String, content : String } -> projectContext -> ( List (Error { useErrorForModule : () }), projectContext ))
    -> ProjectRuleSchema schemaState projectContext moduleContext
    -> ProjectRuleSchema { schemaState | hasAtLeastOneVisitor : () } projectContext moduleContext

-- Usage
ruleSchema
  |> Rule.withExtraVisitor <files to fetch> extraFilesVisitor
  -- ...

extraFilesVisitor : List { path : String, content : String } -> ProjectContext -> ( List Error, ProjectContext )
extraFilesVisitor files context =
   -- ...

For the "files to fetch" part, I think there are many ways we could want things:

Specific files (e.g.: "translations.json")
Arbitrary files that match some patterns (e.g. a glob like "translations.*.json", to match "translations.en.json", "translations.fr.json", ...)

To choose these, we could have an API like

withExtraVisitor [ ExtraFiles.file "translations.json", ExtraFiles.glob "translations.*.json" ] extraFilesVisitor

I feel like this should be something that is sufficient. But maybe this is actually too complex, and [ "translations.json", "translations.*.json" ] is just as good? :thinking:

Fixing

You asked about fixes. I think it could be valuable to have the tool offer fixes for those files in order to automate more things.

That said, I worry about the security aspect. For instance, should a rule that you install from a package be able to write to /etc/ssh/ssh_config? Even if there is a prompt, this can easily go wrong and be disastrous.

I don't know if there are any security risks even for allowing you to read data.

A few options for this. I don't know whether they should help prevent fixes in those files, or even getting access to that data (but I'll just say "look at" to cover either case).

The first option, which would not work, is to have a builder function that sets an allowlist on what files the rule can access

config =
  [ SomeRule.rule
    |> Rule.allowLookingAt [ "translations.json" ]
  ]

The reason why this wouldn't work is because rules would be able to use this function in their definition

rule =
  Rule.newProjectRuleSchema "HarmfulRule" ...
    -- ...
    |> Rule.fromProjectRuleSchema
    |> Rule.allowLookingAt [ "/etc/ssh/ssh_config" ]

So if we have an allowlist, it would have to be defined outside of a rule level (because, that's a bit outside of the reach of the rule)

So here are options that I think could work:

We could limit the rule to look data that is in the same project as the elm.json
- This sounds too limiting, for instance if your project is a monorepo where the Elm project is a subproject. It would then not allow you to access potential important information.
We could have a special thing in the ReviewConfig.elm file, where you'd have a top-level declaration "files" (or some better name) next to "config", which would be an allowlist of all the files to access.
- Problem: rules would not be just drop-in values anymore. You now have to tell people "add this rule to the config AND make sure that 'README.md' and 'translarations.json' are in the files list". If people don't, then they get false positives/negatives or global errors
- Internal anoyance: When compiling the configuration, we would need to compile it differently if there is "files" or not (because the Main file would have to import it, but not import it if it's not there)
A mix of the above, where a top-level declaration indicates which directories (or globs) your project has access to, and by default this is the folder containing the elm.json file.
A mix of the above, but as a separate file (probably not written in Elm, but with like a "gitignore" syntax)
Add options to the CLI to specify the allowlist
- Not great because I'd really like for elm-review to be able to be run as is. Requiring more would be annoying to everybody, and would make editor integration not work
Allow looking at any files but never fixes
Allow looking at any files but only allow fixes for ones in the project
Pray nothing bad comes out of this

Does someone have other ideas?

I think option 6 and 7 are the simplest ones, and option 4 is probably the least intrusive one for the allowlist idea. If we do go with this option of an allowlist, then maybe in a later major version of elm-review we could have a config like config = { rules = [ ... ], allowList = [ ... ] } to ease the compilation work.

With an allowlist, it would give the annoyance where rules would not work as expected, and I think it would still be somewhat be easy for the rule to circumvent things, such as in a following way:

module SomeRule exposing (rule, allowList)

{-| This rule does awesome things.
It does require access to a bunch of files, so
don't forget to import the allowList and use it in your configuration
-}

allowList = [ "/etc/ssh/ssh_config" ]

Please pitch in if you think I'm overthinking this, have some idea or any comment.

gampleman commented 2 years ago

Perhaps just allow stuff in the project, but follow symlinks?

This makes the most common case just work, and gives a reasonably straightforward workaround for the other cases: just symlink your directories...

Another idea to consider: use crazy API design!


type FilePath = Opaque

type FileKey = Opaque

concreteFile : FileKey -> String -> FilePath

glob : FileKey -> String -> FilePath

Rule.withExtraVisitor : List FilePath -> visitorDefinition -> ProjectRuleSchema ...

then config changes from List Rule to FileKey -> List Rule.

Then it can be easily documented that rules should take FilePaths as inputs and definitely not ask for a FileKey...

micahhahn commented 2 years ago

This is cool! I think this will open the door to adding more elm-review rules to help make library maintenance even less painful in Elm.

Personally I would be fine with solution # 1 - it seems reasonable to me that elm-review is scoped to the elm apps directory.

henriquecbuss commented 2 years ago

I feel like this should be something that is sufficient. But maybe this is actually too complex, and [ "translations.json", "translations.*.json" ] is just as good?

I feel like just plain strings are enough. However, if we want an API like you described, perhaps we could get some inspiration from elm-page's DataSource.Glob.

For allow lists, maybe we could consider a mix between your ideas:

Allow reading any file, but don't allow fixes (since rules can't run Cmds, is there an issue in just reading files?)
For files in the project, allow fixes
If the file is in the allow list, allow fixes

Then it's just a matter of defining what a project is, and how to define the allow list.

For example, to support monorepos, elm-review could look up the directory tree until it finds a .git folder (I'm not sure how good this is, as it forces people to use git - but at the same time, elm packages are forced to use github, so is this really an issue?). In case we don't find a .git folder, default to the folder with an elm.json.

For the allow list, I like the idea of it being a separate file (and most importantly, being optional). I think a user should only define an allow list if they need one, as to have the least amount of friction possible when setting up rules. If a rule requires specific files in the allow list for it to work, the rule author can emit an error, telling the user which files the rule needs.

jfmengels commented 2 years ago

@gampleman

Perhaps just allow stuff in the project, but follow symlinks? This makes the most common case just work, and gives a reasonably straightforward workaround for the other cases: just symlink your directories...

I think that would work for some cases, but the case that I think will come most often is when you want to access the data in the parent folder (README.md in example below) or a sibling from the current folder (translations/ for instance).

root

translations/
elm/
- src/
- elm.json
README.md

How would symlinks help here? Would you do something like this?

root

translations/
elm/
- translations/ <symlink to ../translations/ >
- src/
- elm.json
- README.md <symlink to ../README.md > (would conflict with the README of the elm/ project)
README.md

Or would you do something like the following? (this is the part where I may have finally understood you @gampleman :sweat_smile: )

root

translations/
elm/
- parent/ <symlink to ../ >
- src/
- elm.json
README.md

I guess that could work? :thinking:

It would be kind of surprising to tell users that they need to create a symlink for this. But hey, maybe why not :thinking: It would definitely ease the problem for now!

jfmengels commented 11 months ago

Additional use-case that came up: In some cases we don't want to know the contents, but we just want to know which files exist. For instance, if we want to know whether a reference to static/some-image.png is okay, we want to know whether that file exists.

lishaduck commented 2 months ago

This is supported as of 2.14.0! I think it covers all of the use cases outlined here.

henriquecbuss commented 2 months ago

Amazing!!! I'm out of the Elm ecosystem nowadays, but it's great seeing this progress! Great work as always @jfmengels ❤️

jfmengels / elm-review