oxsecurity / megalinter

🦙 MegaLinter analyzes 50 languages, 22 formats, 21 tooling formats, excessive copy-pastes, spelling mistakes and security issues in your repository sources with a GitHub Action, other CI tools or locally.
https://megalinter.io
GNU Affero General Public License v3.0
1.82k stars 214 forks source link

Megalinter 7.x _REGEX path handling seems not fully compatible with 6.x #2744

Closed pfiaux closed 10 months ago

pfiaux commented 1 year ago

Describe the bug We're using regex to limit which paths specific linters run on (monorepo setting):

teamA/
teamB/
teamC/
scripts/

So our linter configs might look like this:

GO_REVIVE_FILTER_REGEX_INCLUDE: ^(teamA|teamC)
PYTHON_FLAKE8_FILTER_REGEX_INCLUDE: ^(teamB|scripts)
JAVA_CHECKSTYLE_REGEX_INCLUDE: ^(teamA|teamB)

We use ^ in front to limit it to root folders, for example:

To Reproduce Assuming there are go file that would trigger a revive lint errors in:

  1. Set GO_REVIVE_FILTER_REGEX_INCLUDE: ^(test)
  2. Run linter for both files test/guilty.go teamA/test/guilty.go

For 6.22 only test/guilty.go shall remain after running the filters.

For 7.x no files remain after running the filters.

Expected behavior The ability to use to filter root of workspace level folders by linter in v7, ideally via _FILTER_REGEX_INCLUDE to be backward compatible with v6. (another solution that achieves the same goal would also work)

Additional context We're running mega-linter in ci (using the image with everything), and passing it a list of files to lint determined by our pipeline.

I feel this could be due to the switch from relative paths to absolute paths, which would cause the ^teamA/... to not match if the path is /tmp/.../teamA/...

nvuillam commented 1 year ago

@pfiaux you're right, it is probably related to relative paths management

Did you try to send the list of files with relative path format ?

Best regards

pfiaux commented 1 year ago

I just checked, looks like I'm already sending the files with relative paths in the config:

MEGALINTER_FILES_TO_LINT:
- teamA/mega-linter.yaml
- scripts/ci-lint.sh
- teamB/product1/main.go

The outputs from megalinter such as MegaLinter now collects the files to analyse or Kept files before applying linter filters: seem to already convert to absolute paths.

pfiaux commented 1 year ago

I'm also fine with another solution even if it's not compatible with the v6 config, as long as we can filter by base path :)

Sticking an absolute path in the config would be problematic tho, for us the absolute path isn't fixed and dependents on which linter instance will pickup the job (we run multiple replicas).

nvuillam commented 1 year ago

I agree that it's an issue...

Would you mind sharing the full log with LOG_LEVEL: DEBUG in .mega-linter.yml ?

If sensitive you can eventually send it to me to nicolas.vuillamy@gmail.com

github-actions[bot] commented 12 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you think this issue should stay open, please remove the O: stale 🤖 label or comment on the issue.

pfiaux commented 11 months ago

@nvuillam did you get my email / have a chance to take a look? Let me know if there's something I can look into from my side (we're stuck on v6 until we find a workaround/solution)

pfiaux commented 11 months ago

I retested this with 7.2.0 today, results are still the same:

MegaLinter now collects the files to analyse
All found files before filtering:
- /tmp/lint/linter-build-0/repo/ci/mega-linter.yaml
- /tmp/lint/linter-build-0/repo/tools/notifier/cmd/root.go
- File extensions: , .bash, .bazel, .bzl, .dash, .go, .java, .ksh, .mk, .pl, .pm, .py, .sh, .t, .yaml, .yml
- File names (regex): BUILD, Dockerfile, Makefile, WORKSPACE

...

Kept [2] files on [2] found files
Kept files before applying linter filters:
- /tmp/lint/linter-build-0/repo/ci/mega-linter.yaml
- /tmp/lint/linter-build-0/repo/tools/notifier/cmd/root.go

...

[Filters] {'name': 'GO_REVIVE', 'filter_regex_include': '^(examples|ci|tools)', 'filter_regex_exclude_descriptor': None, 'filter_regex_exclude_linter': '^(tools/test|tools/other)', 'files_sub_directory': None, 'lint_all_files': False, 'lint_all_other_linters_files': False, 'file_extensions': ['.go'], 'file_names_regex': [], 'file_names_not_ends_with': [], 'file_contains_regex': [], 'file_contains_regex_extensions': []}
GO_REVIVE linter kept 0 files after applying linter filters:
[Filters] {'name': 'JAVA_CHECKSTYLE',...
pfiaux commented 11 months ago

The filters for linters are applied in collect_filesfrom what I can tell.

In utils.filter_files it would seem the regex is run on both absolute and relative files: https://github.com/oxsecurity/megalinter/blob/main/megalinter/utils.py#L154-L158

In my case it's using files_to_lint as input: https://github.com/oxsecurity/megalinter/blob/main/megalinter/MegaLinter.py#L633C6-L633C6 I could be wrong but it looks like the files are already made absolute paths before being passed to utils.filter_files

Would it work if collect_files() used as is paths for files_to_lint? Or pass the raw files_to_lint to utils.filter_files() which should handle relative/absolute path?

nvuillam commented 11 months ago

@pfiaux I apologize for the delay, the summer was full and not a lot of time for advanced issues :/

Any change you can reproduce the issue locally ?

Or what if I add a custom debug variable that would display the values of variables in utils.py when filtering ?

pfiaux commented 11 months ago

Well I can reproduce it in our CI, locally I'm on macOS with the virtualization performance it would take me ages to run it locally.

I will bump to v7.3.0 and try again but from the changelog I don't think you change the paths so I expect I can reproduce it. From digging into the code I suspect the linter filters are passed absolute paths and so their relative path handling isn't leveraged at all.

A debug statement might help confirm that, I wonder if there's a faster way to avoid too many loops between you add something and I test it. I need to check if I can add a debug myself or even maybe attempt a patch.

pfiaux commented 10 months ago

I can confirm still happens in v7.3.0.

Kurt-von-Laven commented 10 months ago

It is definitely possible to edit the MegaLinter Python code directly in a MegaLinter Docker container and iterate more quickly that way.

pfiaux commented 10 months ago

I'm having trouble testing in the large image in CI (nonroot image and other CI hardening) but I was able to reproduce it with a small gist locally.

I suggest the following change as it seems to make it work from the perspective of the filtering:

diff --git a/megalinter/MegaLinter.py b/megalinter/MegaLinter.py
index 8bcb894d5e..261394bb0d 100644
--- a/megalinter/MegaLinter.py
+++ b/megalinter/MegaLinter.py
@@ -630,7 +630,7 @@ class Megalinter:
             all_files = list()
             for file_to_lint in files_to_lint:
                 if os.path.isfile(self.workspace + os.path.sep + file_to_lint):
-                    all_files += [self.workspace + os.path.sep + file_to_lint]
+                    all_files += [file_to_lint]
                 else:
                     logging.warning(
                         "[File listing] Input file "

See here for a small example that should be easy to reproduce locally: https://gist.github.com/pfiaux/5b2d6a37831217816551ff74993caf48

nvuillam commented 10 months ago

This part has been tricky when switching from absolute to related paths in files... i'm afraid to break everything ^^

What if I add a config variable that would activate such behavior only if set to true ?

pfiaux commented 10 months ago

I figured as it sits pretty early in the processing the impact could have a large blast radius. A config option would work for our use case. If you're willing to add that we'll be very happy to upgrade to V7 and to provide feedback if we run into other issues because of that option.

The other alternative I thought of of would be to keep a list of both, say:

nvuillam commented 10 months ago

@pfiaux I think I just understood your use case, which is not so usual :)

Do you confirm you send yourself the list of files using variable MEGALINTER_FILES_TO_LINT ?

pfiaux commented 10 months ago

Correct, we set MEGALINTER_FILES_TO_LINT via the config. I detailed it in https://github.com/oxsecurity/megalinter/issues/2701#issuecomment-1597161249, our pipeline runs our own change detection which in some cases differs from the way megalinter does it, so for consistency we use our list of files.

nvuillam commented 10 months ago

@pfiaux I think I just forgot to take this case in account when switching to relative file paths :) ( https://github.com/oxsecurity/megalinter/pull/1877 )

So I think no need for a specific workaround variable... it's just a bug :D

I'll post here once merged so you'll be able to confirm with beta version that it's ok :)

nvuillam commented 10 months ago

@pfiaux please can you check with beta version?

pfiaux commented 10 months ago

Rolling it out now, did a quick test and it was working as expected so I think we're good. Thanks for the help 🙇

nvuillam commented 10 months ago

@pfiaux great, so you'll be able to upgrade to v7 at the next release :) ( should be in 2 weeks maximum )

ashokm commented 10 months ago

Is there an ETA on when the next release is? I'm looking forward to the next release after the current v7.3.0 as we're also seeing that the CLOUDFORMATION_CFN_LINT (cfn-lint) linter keeps including 'cdk.out/' files in the linting, even though this directory is in the .gitignore file and should be ignored. We did not see this problem in v6.

nvuillam commented 10 months ago

@ashokm 2 weeks maximum :)

ashokm commented 10 months ago

I just tested against beta and I get the same issue as mentioned above, with the CLOUDFORMATION_CFN_LINT (cfn-lint) linter still including 'cdk.out/' files in the linting, even though this directory is in the .gitignore file and should be ignored :/

nvuillam commented 10 months ago

@ashokm if this is another problem, you should declare it in another issue :)