Closed pfiaux closed 10 months ago
@pfiaux you're right, it is probably related to relative paths management
Did you try to send the list of files with relative path format ?
Best regards
I just checked, looks like I'm already sending the files with relative paths in the config:
MEGALINTER_FILES_TO_LINT:
- teamA/mega-linter.yaml
- scripts/ci-lint.sh
- teamB/product1/main.go
The outputs from megalinter such as MegaLinter now collects the files to analyse
or Kept files before applying linter filters:
seem to already convert to absolute paths.
I'm also fine with another solution even if it's not compatible with the v6 config, as long as we can filter by base path :)
Sticking an absolute path in the config would be problematic tho, for us the absolute path isn't fixed and dependents on which linter instance will pickup the job (we run multiple replicas).
I agree that it's an issue...
Would you mind sharing the full log with LOG_LEVEL: DEBUG
in .mega-linter.yml ?
If sensitive you can eventually send it to me to nicolas.vuillamy@gmail.com
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
If you think this issue should stay open, please remove the O: stale 🤖
label or comment on the issue.
@nvuillam did you get my email / have a chance to take a look? Let me know if there's something I can look into from my side (we're stuck on v6 until we find a workaround/solution)
I retested this with 7.2.0 today, results are still the same:
MegaLinter now collects the files to analyse
All found files before filtering:
- /tmp/lint/linter-build-0/repo/ci/mega-linter.yaml
- /tmp/lint/linter-build-0/repo/tools/notifier/cmd/root.go
- File extensions: , .bash, .bazel, .bzl, .dash, .go, .java, .ksh, .mk, .pl, .pm, .py, .sh, .t, .yaml, .yml
- File names (regex): BUILD, Dockerfile, Makefile, WORKSPACE
...
Kept [2] files on [2] found files
Kept files before applying linter filters:
- /tmp/lint/linter-build-0/repo/ci/mega-linter.yaml
- /tmp/lint/linter-build-0/repo/tools/notifier/cmd/root.go
...
[Filters] {'name': 'GO_REVIVE', 'filter_regex_include': '^(examples|ci|tools)', 'filter_regex_exclude_descriptor': None, 'filter_regex_exclude_linter': '^(tools/test|tools/other)', 'files_sub_directory': None, 'lint_all_files': False, 'lint_all_other_linters_files': False, 'file_extensions': ['.go'], 'file_names_regex': [], 'file_names_not_ends_with': [], 'file_contains_regex': [], 'file_contains_regex_extensions': []}
GO_REVIVE linter kept 0 files after applying linter filters:
[Filters] {'name': 'JAVA_CHECKSTYLE',...
The filters for linters are applied in collect_files
from what I can tell.
In utils.filter_files
it would seem the regex is run on both absolute and relative files:
https://github.com/oxsecurity/megalinter/blob/main/megalinter/utils.py#L154-L158
In my case it's using files_to_lint
as input:
https://github.com/oxsecurity/megalinter/blob/main/megalinter/MegaLinter.py#L633C6-L633C6
I could be wrong but it looks like the files are already made absolute paths before being passed to utils.filter_files
Would it work if collect_files()
used as is paths for files_to_lint
? Or pass the raw files_to_lint
to utils.filter_files()
which should handle relative/absolute path?
@pfiaux I apologize for the delay, the summer was full and not a lot of time for advanced issues :/
Any change you can reproduce the issue locally ?
Or what if I add a custom debug variable that would display the values of variables in utils.py when filtering ?
Well I can reproduce it in our CI, locally I'm on macOS with the virtualization performance it would take me ages to run it locally.
I will bump to v7.3.0 and try again but from the changelog I don't think you change the paths so I expect I can reproduce it. From digging into the code I suspect the linter filters are passed absolute paths and so their relative path handling isn't leveraged at all.
A debug statement might help confirm that, I wonder if there's a faster way to avoid too many loops between you add something and I test it. I need to check if I can add a debug myself or even maybe attempt a patch.
I can confirm still happens in v7.3.0.
It is definitely possible to edit the MegaLinter Python code directly in a MegaLinter Docker container and iterate more quickly that way.
I'm having trouble testing in the large image in CI (nonroot image and other CI hardening) but I was able to reproduce it with a small gist locally.
I suggest the following change as it seems to make it work from the perspective of the filtering:
diff --git a/megalinter/MegaLinter.py b/megalinter/MegaLinter.py
index 8bcb894d5e..261394bb0d 100644
--- a/megalinter/MegaLinter.py
+++ b/megalinter/MegaLinter.py
@@ -630,7 +630,7 @@ class Megalinter:
all_files = list()
for file_to_lint in files_to_lint:
if os.path.isfile(self.workspace + os.path.sep + file_to_lint):
- all_files += [self.workspace + os.path.sep + file_to_lint]
+ all_files += [file_to_lint]
else:
logging.warning(
"[File listing] Input file "
See here for a small example that should be easy to reproduce locally: https://gist.github.com/pfiaux/5b2d6a37831217816551ff74993caf48
This part has been tricky when switching from absolute to related paths in files... i'm afraid to break everything ^^
What if I add a config variable that would activate such behavior only if set to true ?
I figured as it sits pretty early in the processing the impact could have a large blast radius. A config option would work for our use case. If you're willing to add that we'll be very happy to upgrade to V7 and to provide feedback if we run into other issues because of that option.
The other alternative I thought of of would be to keep a list of both, say:
all_files_absolute
(or keep that one named all_files
for compatibility)all_files_relative
Then in utils.py the filter could check the regex against both. But that's probably messier than a config option and keeping a single list of files. So it's not exactly better.@pfiaux I think I just understood your use case, which is not so usual :)
Do you confirm you send yourself the list of files using variable MEGALINTER_FILES_TO_LINT ?
Correct, we set MEGALINTER_FILES_TO_LINT
via the config.
I detailed it in https://github.com/oxsecurity/megalinter/issues/2701#issuecomment-1597161249, our pipeline runs our own change detection which in some cases differs from the way megalinter does it, so for consistency we use our list of files.
@pfiaux I think I just forgot to take this case in account when switching to relative file paths :) ( https://github.com/oxsecurity/megalinter/pull/1877 )
So I think no need for a specific workaround variable... it's just a bug :D
I'll post here once merged so you'll be able to confirm with beta version that it's ok :)
@pfiaux please can you check with beta version?
Rolling it out now, did a quick test and it was working as expected so I think we're good. Thanks for the help 🙇
@pfiaux great, so you'll be able to upgrade to v7 at the next release :) ( should be in 2 weeks maximum )
Is there an ETA on when the next release is?
I'm looking forward to the next release after the current v7.3.0 as we're also seeing that the CLOUDFORMATION_CFN_LINT (cfn-lint) linter keeps including 'cdk.out/' files in the linting, even though this directory is in the .gitignore
file and should be ignored.
We did not see this problem in v6.
@ashokm 2 weeks maximum :)
I just tested against beta
and I get the same issue as mentioned above, with the CLOUDFORMATION_CFN_LINT (cfn-lint) linter still including 'cdk.out/' files in the linting, even though this directory is in the .gitignore file and should be ignored :/
@ashokm if this is another problem, you should declare it in another issue :)
Describe the bug We're using regex to limit which paths specific linters run on (monorepo setting):
So our linter configs might look like this:
We use
^
in front to limit it to root folders, for example:scripts/*
files will be covered by the python linterteamA/scripts/*
shall not be covered by the python linterTo Reproduce Assuming there are go file that would trigger a
revive
lint errors in:test/guilty.go
teamA/test/guilty.go
GO_REVIVE_FILTER_REGEX_INCLUDE: ^(test)
test/guilty.go teamA/test/guilty.go
For 6.22 only
test/guilty.go
shall remain after running the filters.For 7.x no files remain after running the filters.
Expected behavior The ability to use to filter root of workspace level folders by linter in v7, ideally via
_FILTER_REGEX_INCLUDE
to be backward compatible with v6. (another solution that achieves the same goal would also work)Additional context We're running mega-linter in ci (using the image with everything), and passing it a list of files to lint determined by our pipeline.
I feel this could be due to the switch from relative paths to absolute paths, which would cause the
^teamA/...
to not match if the path is/tmp/.../teamA/...