renovatebot / renovate

Home of the Renovate CLI: Cross-platform Dependency Automation by Mend.io
https://mend.io/renovate
GNU Affero General Public License v3.0
17.38k stars 2.27k forks source link

Consolidate/unify file and path handling - regex vs glob #10582

Open rarkins opened 3 years ago

rarkins commented 3 years ago

What would you like Renovate to be able to do?

Ideally have a consistent approach to file/path pattern handling.

Did you already have any implementation ideas?

Today we have the following configuration options relating to file names and paths:

I think we can agree that the "any string pattern" approach is bad and can be deprecated. Then the decision is: glob, regex, or both?

There is no easy way I'm aware of to know for sure that a string is intended to be a glob or a regex because some strings are valid for both.

I think regex are more powerful, but also harder to get started at than glob. At the same time, glob patterns are not particularly intuitive to write once you go beyond "foo*", and there seems to be some inconsistency between implementations too. I'd want to stick with regex for fileMatch which then leans towards regex for others too.

For ones which are only a glob today, we could maybe migrate them to equivalent regex. There was a glob-to-regex library now archived, so maybe they found it to be too difficult of a problem? Another possibility would be to rename such fields and deprecate the old ones so that people get warning messages, but that is not the most user-friendly if we require them to do the migration.

@danez I noticed that you are active on globs and have raised issuess here in the past. Do you have any ideas?

viceice commented 3 years ago

this is very hard to change without new properties.

But my suggestion is to do normal glob and use regex when string is like a full regex /..../. I think a string starting and ending with / is not a meaningful glob for renovate.

rarkins commented 3 years ago

I thought about the /.../ idea too but it would need to be consistent, meaning fileMatch adopt it too. That introduces even more complexity :)

Another spin on the question: if we deprecate glob support, is it a problem?

rarkins commented 1 year ago

Initial matching

Used for excluding file paths before fileMatch. Usually used at top level but can also be used within manager blocks. In theory this could be combined with fileMatch concepts if we had a first class concept of "and not".

Inverse of ignorePaths - can be used at global level or within manager objects.

Used for matching manager files, so valid within manager objects only. This happens after ignorePaths and includePaths.

Package Rules

Within packageRules only. Has some overlap with matchPaths due to the fact matchPaths was originally implemented with partial string match, which is being removed.

Within packageRules only.

Others

Used within postUpgradeTasks only to filter which files to commit.

Top-level, but also can be in package rules. Maybe this should be combined with the ignorePaths concept?

rarkins commented 1 year ago

We could support glob or regex for all fields as long as we have an intuitive way to distinguish between then. e.g. if it starts and ends with / (e.g. /^abc/) then we know it's regex, otherwise treat it as a glob.

We could also support negative regex matching via a !/ prefix, e.g. !/^abc/.

A challenge is how to handle OR and AND logic when we use a single array. For example if everything in the array is OR then ["/^abc", "!/^abc/"] matches everything. We could:

rarkins commented 1 year ago

I think we can use this & prefix for both glob and regex with the logic "all non anded patterns are first OR'd together, and then collectively AND'd together with each & pattern. e.g. in regex

This negation syntax would allow us to consolidate includePaths and ignorePaths into a single setting, e.g. extractPaths or fileListFilter (the former seems more self-explanatory if it's only used during extract)

rarkins commented 1 year ago

Could we combine fileFilters and excludeCommitPaths into one, e.g. commitPaths and default it to **/*?

Perhaps a problem is with this edge case:

Could we still support this intuitively if the repo commitPaths is **/* while while the postUpgradeTasks.commitPaths is !X?

rarkins commented 1 year ago

Possible consolidations:

rarkins commented 1 year ago

Another possibility: files must match every negated filter and any regular one. This would avoid us needing many &! prefixes for ignorePaths migration

secustor commented 1 year ago

I think this will quickly becoming hard to read.

WDYT about: [["/^abc/", "test/**"], "!/\.js/"] --> ( "/^abc/" OR "test/**" ) AND "!/.js/"

or here a more verbose alternative: Keep the same basic behavior like we have right now but allow to define objects:

[
  {
     operation: "AND",
     elements: [
       "!/\.js/" ,
       {
         operation: "OR",
         elements: ["/^abc/", "test/**"]
       }
     ]
  }
]
rarkins commented 1 year ago

None of these seem very readable to me either.

We should definitely optimize for the normal case of match X or match Y or match Z. Handling "not match" and AND logic is much harder. I'm still interested in the idea of making each ! to be AND'd

reitzig commented 1 year ago

FWIW:

I for one see little use of implementing an expression syntax in JSON (unless a standard one can be reused). If the criteria are that complicated and necessary, maybe we need to implement a dedicated manager? Or provide some scripting interface, which I'd be reluctant to use as well. I'd probably just refactor the repo to get a grokable config.

rarkins commented 1 year ago

FWIW:

  • Using slashes to distinguish globs from regex means no absolute paths in globs. (Not sure if that's a problem.)

Fortunately we don't have a need for absolute paths because these filters are always based on relative paths/files within the repo.

  • If a string parses both as glob and as regex, are the respective semantics the same? (Gut feeling: yes.) If so, you could just let the parsers decide.

No, e.g. the string "abc" in glob means "only the file abc" while in regex that would mean "any file containing the string abc".

For this reason we want to make it so that users have to "signal intent" of which they want.

I for one see little use of implementing an expression syntax in JSON (unless a standard one can be reused). If the criteria are that complicated and necessary, maybe we need to implement a dedicated manager? Or provide some scripting interface, which I'd be reluctant to use as well. I'd probably just refactor the repo to get a grokable config.

The challenge we have is include vs exclude semantics. By default at the repo level, we include every file. It would be more comment for people to exclude certain paths than for them to say "only include these paths". On the other hand, at the package manager level, we exclude every file by default and then define just certain patterns we are interested in. Generally you want to OR your includes and AND your excludes.

rarkins commented 1 year ago

Related: #24695