Closed b4yuan closed 2 months ago
@gek169 You responded to another PR regarding the automation of the PRs. We've only raised PRs for repositories that we've been able to fork and test CodeQL on on a repo-by-repo basis.
Regarding the automatically generated code, I'll reiterate that this is the default workflow that Github provides you with if you wanted to create a CodeQL Analysis workflow under GitHub actions. It has been modified to support a build script, a script that forces the workflow to fail if there are errors, and filters to remove rules or directories for scanning.
I reiterate that this PR contains neither spam (which by definition is useless, rather than useful) nor malware.
I have replied to the concerns of @gek169 at greater length here.
Do you have a link to a repository which uses CodeQL where I could see how the results look like?
@tuupola You can look under the Security tab of this repo under Code scanning alerts to look at the results of the CodeQL Analysis from this PR if you wanted to see the results on hagl
.
CodeQL results are not visible to anyone but the repo maintainers.
@b4yuan There is nothing there. Apparently this PR first has to be merged to see the output.
While I do believe this PR has been done with good intentions let me tell how it looks for an open source maintainer.
Problem #1. A new tool is introduced without discussing first. Additionally it takes considerable amount of time and effort to actually understand what the tool does. An example output would be a minimum. Instead I am now directed to read a marketing site and documentation site.
https://codeql.github.com/ https://codeql.github.com/docs/
Maybe you could provide an example repository with successful output and failed output of a CodeQL run?
Problem #2. The organisation where the PR originates has forked 275 repositories and anonymous users (without real name) are submitting the same PR to these repositories. This behavior is suspicious and spammy.
GitHub is spammy these days. Bad actors are trying to hijack libraries. You should atleast make the PRs using a non anonymous account.
@tuupola Thanks for your feedback, we appreciate it.
Problem #1: Agree, these could be improved. Though I'm curious if you can explain more about the difference you perceive between opening an issue (for discussion) vs. raising a PR (where discussion can still occur, but in the context of the code contribution)?
Problem #2:
@tuupola To add, here is an image of our forked repo of hagl
and the CodeQL results in the Code Scanning section under Security:
Regarding 'failed output'---as long as CodeQL is configured properly, there really isn't 'failed output'. Maybe 'bad output', like false positives, but there are options to dismiss an error/warning for different reasons:
It seems I forgot to answer to this.
Anonymous account: The student's name is Brian Yuan and his GitHub account is @b4yuan. That doesn't seem any more anonymous than @tuupola?
You might know the students by name. The rest of us do not. Here are screenshots taken when this pr was opened. I think the difference is quite obvious.
vs
Summary
This pull request introduces a CodeQL workflow to enhance the security analysis of this repository.
What is CodeQL
CodeQL is a static analysis tool that helps identify and mitigate security vulnerabilities. It is primarily intra-function but does provide some support for inter-function analysis. By integrating CodeQL into a GitHub Actions workflow, it can proactively identify and address potential issues before they become security threats.
For more information on CodeQL and how to interpret its results, refer to the GitHub documentation and the CodeQL documentation (https://codeql.github.com/ and https://codeql.github.com/docs/).
What this PR does
We added a new CodeQL workflow file (.github/workflows/codeql.yml) that
Validation
To validate the functionality of this workflow, we have run several test scans on the codebase and reviewed the results. The workflow successfully compiles the project, identifies issues, and provides actionable insights while reducing noise by excluding certain queries and third-party code.
Using the workflow results
If this pull request is merged, the CodeQL workflow will be automatically run on every push to the main branch and on every pull request to the main branch. To view the results of these code scans, follow these steps:
Is this a good idea?
We are researchers at Purdue University in the USA. We are studying the potential benefits and costs of using CodeQL on open-source repositories of embedded software.
We wrote up a report of our findings so far. The TL;DR is “CodeQL outperforms the other freely-available static analysis tools, with fairly low false positive rates and lots of real defects”. You can read about the report here: https://arxiv.org/abs/2310.00205
Review of engineering hazards
License: see the license at https://github.com/github/codeql-cli-binaries/blob/main/LICENSE.md:
False positives: We find that around 20% of errors are false positives, but that these FPs are polarized and only a few rules contribute to most FPs. We find that the top rules contributing to FPs are: cpp/uninitialized-local, cpp/missing-check-scanf, cpp/suspicious-pointer-scaling, cpp/unbounded-write, cpp/constant-comparison, and cpp/inconsistent-null-check. Adding a filter to filter out certain rules that contribute to a high FP rate can be done simply in the workflow file.