oxsecurity / megalinter

🦙 MegaLinter analyzes 50 languages, 22 formats, 21 tooling formats, excessive copy-pastes, spelling mistakes and security issues in your repository sources with a GitHub Action, other CI tools or locally.
https://megalinter.io
GNU Affero General Public License v3.0
1.82k stars 214 forks source link

Detect type of files with no extension via the first line's `#!` (if it exists) #464

Closed ottumm closed 2 years ago

ottumm commented 3 years ago

Is your feature request related to a problem? Please describe.

I find it tedious to have to specify e.g. BASH_SHELLCHECK_FILE_NAMES_REGEX for files lacking an extension. At the same time, it would (obviously) be incorrect to assume that every file lacking an extension is a Bash script.

Describe the solution you'd like

I would like mega-linter to read the first line of every file lacking an extension that does not already match a rule specifying the file type, and infer the file type by examining the #! value, if it exists.

Describe alternatives you've considered

Alternatively (or additionally!), I would like to specify one BASH_FILE_NAMES_REGEX (or PYTHON_FILE_NAMES_REGEX, etc) config value, rather than having to repeat the regex for every Bash linter (BASH_EXEC_FILE_NAMES_REGEX, BASH_SHELLCHECK_FILE_NAMES_REGEX, etc).

This would be helpful especially because I wouldn't have to remember all the linters for each file type, and it would apply if new linters are added in the future.

I think this would be useful regardless of whether my original feature proposal is accepted or not, so I'm happy to open a separate issue for it.

Additional context

If this sounds like a good idea, I'm happy to work on it.

nvuillam commented 3 years ago

Good hack for BASH_SHELLCHECK_FILE_NAMES_REGEX , I didn't think of that :)

Linting files with no extension by default is indeed a use case that should be managed by MegaLinter

I think that for performance reasons we may need to add a variable LINT_FILES_WITHOUT_EXTENSIONS (false by default), to avoid to have to read all files every time, whereas it may be not necessary for most repos. But I'm not sure of the cases where we would have a repo with a lot of files without extensions that must not be linted... do you ? Please share your opinion :)

The architecture update could be the following:

The second solution would be easier to develop ,but forcing users to declare all the files they want to lint would be against MegaLinter philosophy to be the more "plug & play" possible, so let's go for your first proposition, your future PR is very welcomed :)

ottumm commented 3 years ago

Awesome, I'll take this on — feel free to assign me to this issue if that's helpful for tracking.

I'll also open a separate issue for the generalized language settings (e.g. BASH_FILTER_REGEX_EXCLUDE), because I think that would still be useful for other things.

Enabling / disabling by default

I think that for performance reasons we may need to add a variable LINT_FILES_WITHOUT_EXTENSIONS (false by default), to avoid to have to read all files every time, whereas it may be not necessary for most repos. But I'm not sure of the cases where we would have a repo with a lot of files without extensions that must not be linted... do you ?

Yeah, this is an important consideration for sure. I think you're right that the safest thing would be to have shebang detection disabled by default for performance, but on the other hand enabling by default makes MegaLinter more plug & play.

I'll do some (probably light) performance testing to try to determine the marginal performance cost of this feature.

Implementation

I like your proposed architecture, and I think I'll go with that, since it seems like it fits in well with how MegaLinter does things now.

An alternative I've been considering is to use libmagic to detect the file type. The advantage of this is that it's a library that already knows how to scan header info and determine file type from it — and I think it's been fairly optimized. However, I'm not sure if that would fit well into the existing MegaLinter architecture. I can investigate, but if you have thoughts I'd love to hear them. =)

nvuillam commented 3 years ago
github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you think this issue should stay open, please remove the O: stale 🤖 label or comment on the issue.

ottumm commented 3 years ago

I’ve had to put this aside but I’m still going to work on it!

nvuillam commented 3 years ago

Thats's great news, i'll be happy to validate the PR :) No stress, we all have priorities on our rent-paying jobs 😅

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you think this issue should stay open, please remove the O: stale 🤖 label or comment on the issue.

nvuillam commented 3 years ago

Not stale ? :)

ottumm commented 3 years ago

I mean, it’s getting embarrassing but I’m still into it. I’ll find time.

nvuillam commented 3 years ago

Don't worry it's not embarrassing ;)

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you think this issue should stay open, please remove the O: stale 🤖 label or comment on the issue.

nvuillam commented 2 years ago

Not stale ? ^^

CAM-Gerlach commented 2 years ago

One other suggestion, in case you aren't already committed— Pre-Commit maintains and uses Identify for its primary file identification, which identifies file formats by both extension and shebang, as well as tags it with other things like file/directory, text/binary, symlink, etc.

EDIT: That pun was completely unintentional, I swear

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you think this issue should stay open, please remove the O: stale 🤖 label or comment on the issue.

nvuillam commented 2 years ago

heyyyy not stale ? :) @ottumm ? 👼

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you think this issue should stay open, please remove the O: stale 🤖 label or comment on the issue.