peter-murray / github-security-report-action

MIT License
78 stars 27 forks source link

The CWE data is not showing up #7

Open amitgupta7 opened 2 years ago

amitgupta7 commented 2 years ago

Hi Peter,

The report generated from the action seems to be missing the CWE data, seems CWE labels are removed the Code Scanning alerts API. Please see attached, the summary PDF generated.

summary.pdf

@rohitnb @mohan-the-octocat @amol1717

jorge-abarca commented 2 years ago

It seems that the CWE coverage is derived from the Applied Code Scanning Rules, which are code scanning rules that are retrieved from the sarif files that were previously generated by CodeQL's analysis action.

However, the expected format of the sarif files used to populate the rules property of the SarifReport is the following:

{
    "file": "/home/runner/work/sample-repository/results/csharp.sarif",
    "payload": {
        "data": {
            "$schema": "https://json.schemastore.org/sarif-2.1.0.json",
            "version": "2.1.0",
            "runs": [
                {
                    "tool" : {
                        "driver" : {
                          "name" : "CodeQL",
                          "organization" : "GitHub",
                          "semanticVersion" : "2.2.5",
                          "rules": [ {  } ]
                        }
                    }
                }
            ]
        }
    }
}

Which doesn't quite match the output of the sarif report, which is in the format of:

{
    "file": "/home/runner/work/sample-repository/results/csharp.sarif",
    "payload": {
        "data": {
            "$schema": "https://json.schemastore.org/sarif-2.1.0.json",
            "version": "2.1.0",
            "runs": [
                {
                    "tool" : {
                        "driver" : {
                          "name" : "CodeQL",
                          "organization" : "GitHub",
                          "semanticVersion" : "2.2.5",
                          "rules": []
                        },
                        "extensions": [
                            {
                                "name": "codeql/java-queries",
                                "semanticVersion": "0.0.13+4551af90f61a8d5f5c1c88a036595b5919a6c98e",
                                "locations": [ { } ],
                                "notifications": [ { } ],
                                "rules": [ {  } ]
                            }
                        ]
                    }
                }
            ]
        }
    }
}

While the rules property is still inside the driver property, it will always be empty. So, in order to fix this issue it is necessary to get the rules of all extensions while considering that only extensions with rules applied will have a rules property. This can be done in the getRules function of the SarifReport class.

I created a PR that makes the changes described above but keep in mind that I believe this approach works great for creating reports with only one language but not when you have more than one language.

The reasoning behind that is that the action uses GitHub's API to retrieve alerts, dependencies and vulnerabilities, which will have all the information of all the languages when the last language finishes its analysis; however, the sarif files will only have information of the rules related to the last language being analyzed.

There could potentially be a few ways to address that, but I am wondering if leveraging the rule associated with every alert in the alerts method of the API would suffice.

I might create another PR based on that, but I am not sure if the intent of the number of rules applied was the number of rules that were applied. If that is the case, then the fix would be easy.