tuupola / hagl

Hardware Agnostic Graphics Library for embedded
https://www.appelsiini.net/tags/hagl/
MIT License
293 stars 47 forks source link

Add CodeQL Workflow for Code Security Analysis #115

Open b4yuan opened 8 months ago

b4yuan commented 8 months ago

Summary

This pull request introduces a CodeQL workflow to enhance the security analysis of this repository.

What is CodeQL

CodeQL is a static analysis tool that helps identify and mitigate security vulnerabilities. It is primarily intra-function but does provide some support for inter-function analysis. By integrating CodeQL into a GitHub Actions workflow, it can proactively identify and address potential issues before they become security threats.

For more information on CodeQL and how to interpret its results, refer to the GitHub documentation and the CodeQL documentation (https://codeql.github.com/ and https://codeql.github.com/docs/).

What this PR does

We added a new CodeQL workflow file (.github/workflows/codeql.yml) that

Validation

To validate the functionality of this workflow, we have run several test scans on the codebase and reviewed the results. The workflow successfully compiles the project, identifies issues, and provides actionable insights while reducing noise by excluding certain queries and third-party code.

Using the workflow results

If this pull request is merged, the CodeQL workflow will be automatically run on every push to the main branch and on every pull request to the main branch. To view the results of these code scans, follow these steps:

  1. Under the repository name, click on the Security tab.
  2. In the left sidebar, click Code scanning alerts.

Is this a good idea?

We are researchers at Purdue University in the USA. We are studying the potential benefits and costs of using CodeQL on open-source repositories of embedded software.

We wrote up a report of our findings so far. The TL;DR is “CodeQL outperforms the other freely-available static analysis tools, with fairly low false positive rates and lots of real defects”. You can read about the report here: https://arxiv.org/abs/2310.00205

Review of engineering hazards

License: see the license at https://github.com/github/codeql-cli-binaries/blob/main/LICENSE.md:

Here's what you may also do with the Software, but only with an Open Source Codebase and subject to the License Restrictions provisions below:

Perform analysis on the Open Source Codebase.

If the Open Source Codebase is hosted and maintained on GitHub.com, generate CodeQL databases for or during automated analysis, CI, or CD.

False positives: We find that around 20% of errors are false positives, but that these FPs are polarized and only a few rules contribute to most FPs. We find that the top rules contributing to FPs are: cpp/uninitialized-local, cpp/missing-check-scanf, cpp/suspicious-pointer-scaling, cpp/unbounded-write, cpp/constant-comparison, and cpp/inconsistent-null-check. Adding a filter to filter out certain rules that contribute to a high FP rate can be done simply in the workflow file.

b4yuan commented 8 months ago

@gek169 You responded to another PR regarding the automation of the PRs. We've only raised PRs for repositories that we've been able to fork and test CodeQL on on a repo-by-repo basis.

b4yuan commented 8 months ago

Regarding the automatically generated code, I'll reiterate that this is the default workflow that Github provides you with if you wanted to create a CodeQL Analysis workflow under GitHub actions. It has been modified to support a build script, a script that forces the workflow to fail if there are errors, and filters to remove rules or directories for scanning.

davisjam commented 8 months ago

I reiterate that this PR contains neither spam (which by definition is useless, rather than useful) nor malware.

I have replied to the concerns of @gek169 at greater length here.

tuupola commented 8 months ago

Do you have a link to a repository which uses CodeQL where I could see how the results look like?

b4yuan commented 8 months ago

@tuupola You can look under the Security tab of this repo under Code scanning alerts to look at the results of the CodeQL Analysis from this PR if you wanted to see the results on hagl.

CodeQL results are not visible to anyone but the repo maintainers.

tuupola commented 8 months ago

@b4yuan There is nothing there. Apparently this PR first has to be merged to see the output.

image

tuupola commented 8 months ago

While I do believe this PR has been done with good intentions let me tell how it looks for an open source maintainer.

Problem #1. A new tool is introduced without discussing first. Additionally it takes considerable amount of time and effort to actually understand what the tool does. An example output would be a minimum. Instead I am now directed to read a marketing site and documentation site.

https://codeql.github.com/ https://codeql.github.com/docs/

Maybe you could provide an example repository with successful output and failed output of a CodeQL run?

Problem #2. The organisation where the PR originates has forked 275 repositories and anonymous users (without real name) are submitting the same PR to these repositories. This behavior is suspicious and spammy.

https://github.com/IVOES

GitHub is spammy these days. Bad actors are trying to hijack libraries. You should atleast make the PRs using a non anonymous account.

davisjam commented 8 months ago

@tuupola Thanks for your feedback, we appreciate it.

Problem #1: Agree, these could be improved. Though I'm curious if you can explain more about the difference you perceive between opening an issue (for discussion) vs. raising a PR (where discussion can still occur, but in the context of the code contribution)?

Problem #2:

b4yuan commented 8 months ago

@tuupola To add, here is an image of our forked repo of hagl and the CodeQL results in the Code Scanning section under Security:

image

Regarding 'failed output'---as long as CodeQL is configured properly, there really isn't 'failed output'. Maybe 'bad output', like false positives, but there are options to dismiss an error/warning for different reasons:

image

tuupola commented 7 months ago

It seems I forgot to answer to this.

Anonymous account: The student's name is Brian Yuan and his GitHub account is @b4yuan. That doesn't seem any more anonymous than @tuupola?

You might know the students by name. The rest of us do not. Here are screenshots taken when this pr was opened. I think the difference is quite obvious.

Screenshot from 2023-11-07 20-47-40

vs

Screenshot from 2023-11-07 20-47-10