squizlabs / PHP_CodeSniffer

PHP_CodeSniffer tokenizes PHP files and detects violations of a defined set of coding standards.
BSD 3-Clause "New" or "Revised" License
10.62k stars 1.48k forks source link

Feature request: SARIF Output #3496

Open nvuillam opened 2 years ago

nvuillam commented 2 years ago

Hi, is it in the roadmap to make PHP_CodeSniffer provide SARIF output ? (SARIF is the OASIS common format for all analysis tools )

It would help improve its integration within MegaLinter :)

Best regards

gsherwood commented 2 years ago

No, it's not on the roadmap. I hadn't heard of the format before and I couldn't devote any time to this unless the demand was high. I also wouldn't accept a PR to add the report format to the core unless the demand was there.

nvuillam commented 2 years ago

The format is widely used, especially since GitHub reads it to display lint errors directly in Pull Requests UI, so if a good samaritan would submit a PR, you may be wise to accept it :)

As you can see in this table (orange badges), the format is more and more used by many code analyzers, including PHP ones like PHP_Stan ^^

https://megalinter.github.io/v6-alpha/supported-linters/

jrfnl commented 2 years ago

@nvuillam Just FYI: external repositories/standards can use their own reports.

For more information on how to do so, see these issues/PRs:

jrfnl commented 2 years ago

Also just looked at the specs and ... jeezs louise.... who's got the time to even read that, let alone implement it ? This would require a study of that standard which could easily take several weeks by the looks of it and does not align with any of the other report formats provided. This won't be straight-forward to implement.

nvuillam commented 2 years ago

The specs are long, but I think a first version with tool and results should not be so hard :) (and also not sure anyone has read them, people just follow the examples :D )

Example of SARIF output from python bandit (snippets are not mandatory i think -most of other SARIF output linters do not have it- , just message, artifactLocation and region )

Building such format with only the information you currently should be reasonable ^^

{
  "runs": [
    {
      "tool": {
        "driver": {
          "name": "Bandit",
          "rules": [
            {
              "id": "B404",
              "name": "blacklist",
              "helpUri": "https://bandit.readthedocs.io/en/latest/blacklists/blacklist_imports.html#b404-import-subprocess"
            },
            {
              "id": "B506",
              "name": "yaml_load",
              "helpUri": "https://bandit.readthedocs.io/en/latest/plugins/b506_yaml_load.html"
            },
            {
              "id": "B602",
              "name": "subprocess_popen_with_shell_equals_true",
              "helpUri": "https://bandit.readthedocs.io/en/latest/plugins/b602_subprocess_popen_with_shell_equals_true.html"
            },
            {
              "id": "B607",
              "name": "start_process_with_partial_path",
              "helpUri": "https://bandit.readthedocs.io/en/latest/plugins/b607_start_process_with_partial_path.html"
            },
            {
              "id": "B110",
              "name": "try_except_pass",
              "helpUri": "https://bandit.readthedocs.io/en/latest/plugins/b110_try_except_pass.html"
            }
          ]
        }
      },
      "invocations": [
        {
          "executionSuccessful": true,
          "endTimeUtc": "2021-12-09T17:03:57Z"
        }
      ],
      "results": [
        {
          "message": {
            "text": "Consider possible security implications associated with the subprocess module."
          },
          "level": "note",
          "locations": [
            {
              "physicalLocation": {
                "region": {
                  "snippet": {
                    "text": "import subprocess\n"
                  },
                  "startLine": 10
                },
                "artifactLocation": {
                  "uri": "file:///github/workspace/.github/workflows/build.py"
                },
                "contextRegion": {
                  "snippet": {
                    "text": "import re\nimport subprocess\nimport sys\n"
                  },
                  "endLine": 11,
                  "startLine": 9
                }
              }
            }
          ],
          "properties": {
            "issue_confidence": "HIGH",
            "issue_severity": "LOW"
          },
          "ruleId": "B404",
          "ruleIndex": 0
        },
        {
          "message": {
            "text": "Use of unsafe yaml load. Allows instantiation of arbitrary objects. Consider yaml.safe_load()."
          },
          "locations": [
            {
              "physicalLocation": {
                "region": {
                  "snippet": {
                    "text": "            descriptor = yaml.load(f, Loader=yaml.FullLoader)\n"
                  },
                  "startLine": 121
                },
                "artifactLocation": {
                  "uri": "file:///github/workspace/.github/workflows/build.py"
                },
                "contextRegion": {
                  "snippet": {
                    "text": "        with open(descriptor_file, \"r\", encoding=\"utf-8\") as f:\n            descriptor = yaml.load(f, Loader=yaml.FullLoader)\n            if (\n"
                  },
                  "endLine": 122,
                  "startLine": 120
                }
              }
            }
          ],
          "properties": {
            "issue_confidence": "HIGH",
            "issue_severity": "MEDIUM"
          },
          "ruleId": "B506",
          "ruleIndex": 1
        },
        {
          "message": {
            "text": "Use of unsafe yaml load. Allows instantiation of arbitrary objects. Consider yaml.safe_load()."
          },
          "locations": [
            {
              "physicalLocation": {
                "region": {
                  "snippet": {
                    "text": "                instance=yaml.load(mega_linter_config, Loader=yaml.FullLoader),\n"
                  },
                  "startLine": 1732
                },
                "artifactLocation": {
                  "uri": "file:///github/workspace/.github/workflows/build.py"
                },
                "contextRegion": {
                  "snippet": {
                    "text": "            jsonschema.validate(\n                instance=yaml.load(mega_linter_config, Loader=yaml.FullLoader),\n                schema=yaml.load(descriptor_schema, Loader=yaml.FullLoader),\n"
                  },
                  "endLine": 1733,
                  "startLine": 1731
                }
              }
            }
          ],
          "properties": {
            "issue_confidence": "HIGH",
            "issue_severity": "MEDIUM"
          },
          "ruleId": "B506",
          "ruleIndex": 1
        },
        {
          "message": {
            "text": "Use of unsafe yaml load. Allows instantiation of arbitrary objects. Consider yaml.safe_load()."
          },
          "locations": [
            {
              "physicalLocation": {
                "region": {
                  "snippet": {
                    "text": "                schema=yaml.load(descriptor_schema, Loader=yaml.FullLoader),\n"
                  },
                  "startLine": 1733
                },
                "artifactLocation": {
                  "uri": "file:///github/workspace/.github/workflows/build.py"
                },
                "contextRegion": {
                  "snippet": {
                    "text": "                instance=yaml.load(mega_linter_config, Loader=yaml.FullLoader),\n                schema=yaml.load(descriptor_schema, Loader=yaml.FullLoader),\n            )\n"
                  },
                  "endLine": 1734,
                  "startLine": 1732
                }
              }
            }
          ],
          "properties": {
            "issue_confidence": "HIGH",
            "issue_severity": "MEDIUM"
          },
          "ruleId": "B506",
          "ruleIndex": 1
        },
        {
          "message": {
            "text": "Use of unsafe yaml load. Allows instantiation of arbitrary objects. Consider yaml.safe_load()."
          },
          "locations": [
            {
              "physicalLocation": {
                "region": {
                  "snippet": {
                    "text": "                        instance=yaml.load(descriptor, Loader=yaml.FullLoader),\n"
                  },
                  "startLine": 1749
                },
                "artifactLocation": {
                  "uri": "file:///github/workspace/.github/workflows/build.py"
                },
                "contextRegion": {
                  "snippet": {
                    "text": "                    jsonschema.validate(\n                        instance=yaml.load(descriptor, Loader=yaml.FullLoader),\n                        schema=yaml.load(descriptor_schema, Loader=yaml.FullLoader),\n"
                  },
                  "endLine": 1750,
                  "startLine": 1748
                }
              }
            }
          ],
          "properties": {
            "issue_confidence": "HIGH",
            "issue_severity": "MEDIUM"
          },
          "ruleId": "B506",
          "ruleIndex": 1
        },
        {
          "message": {
            "text": "Use of unsafe yaml load. Allows instantiation of arbitrary objects. Consider yaml.safe_load()."
          },
          "locations": [
            {
              "physicalLocation": {
                "region": {
                  "snippet": {
                    "text": "                        schema=yaml.load(descriptor_schema, Loader=yaml.FullLoader),\n"
                  },
                  "startLine": 1750
                },
                "artifactLocation": {
                  "uri": "file:///github/workspace/.github/workflows/build.py"
                },
                "contextRegion": {
                  "snippet": {
                    "text": "                        instance=yaml.load(descriptor, Loader=yaml.FullLoader),\n                        schema=yaml.load(descriptor_schema, Loader=yaml.FullLoader),\n                    )\n"
                  },
                  "endLine": 1751,
                  "startLine": 1749
                }
              }
            }
          ],
          "properties": {
            "issue_confidence": "HIGH",
            "issue_severity": "MEDIUM"
          },
          "ruleId": "B506",
          "ruleIndex": 1
        },
        {
          "message": {
            "text": "subprocess call with shell=True identified, security issue."
          },
          "level": "error",
          "locations": [
            {
              "physicalLocation": {
                "region": {
                  "snippet": {
                    "text": "        shell=True,\n"
                  },
                  "startLine": 2220
                },
                "artifactLocation": {
                  "uri": "file:///github/workspace/.github/workflows/build.py"
                },
                "contextRegion": {
                  "snippet": {
                    "text": "        cwd=os.getcwd() + \"/.automation\",\n        shell=True,\n    )\n    print(process.stdout)\n    print(process.stderr)\n\n\ndef generate_version():\n"
                  },
                  "endLine": 2226,
                  "startLine": 2219
                }
              }
            }
          ],
          "properties": {
            "issue_confidence": "HIGH",
            "issue_severity": "HIGH"
          },
          "ruleId": "B602",
          "ruleIndex": 2
        },
        {
          "message": {
            "text": "Starting a process with a partial executable path"
          },
          "level": "note",
          "locations": [
            {
              "physicalLocation": {
                "region": {
                  "snippet": {
                    "text": "    process = subprocess.run(\n"
                  },
                  "startLine": 2230
                },
                "artifactLocation": {
                  "uri": "file:///github/workspace/.github/workflows/build.py"
                },
                "contextRegion": {
                  "snippet": {
                    "text": "    cwd_to_use = os.getcwd() + \"/mega-linter-runner\"\n    process = subprocess.run(\n        [\n            \"npm\",\n            \"version\",\n            \"--newversion\",\n            RELEASE_TAG,\n            \"-no-git-tag-version\",\n            \"--no-commit-hooks\",\n        ],\n        stdout=subprocess.PIPE,\n        universal_newlines=True,\n        cwd=cwd_to_use,\n        shell=True,\n    )\n"
                  },
                  "endLine": 2243,
                  "startLine": 2229
                }
              }
            }
          ],
          "properties": {
            "issue_confidence": "HIGH",
            "issue_severity": "LOW"
          },
          "ruleId": "B607",
          "ruleIndex": 3
        },
        {
          "message": {
            "text": "subprocess call with shell=True identified, security issue."
          },
          "level": "error",
          "locations": [
            {
              "physicalLocation": {
                "region": {
                  "snippet": {
                    "text": "        shell=True,\n"
                  },
                  "startLine": 2242
                },
                "artifactLocation": {
                  "uri": "file:///github/workspace/.github/workflows/build.py"
                },
                "contextRegion": {
                  "snippet": {
                    "text": "        cwd=cwd_to_use,\n        shell=True,\n    )\n    print(process.stdout)\n    print(process.stderr)\n    # Update changelog\n    changelog_file = f\"{REPO_HOME}/CHANGELOG.md\"\n\n    with open(changelog_file, \"r\", encoding=\"utf-8\") as md_file:\n        changelog_content = md_file.read()\n    changelog_content = changelog_content.replace(\"<!-- linter-versions-end -->\", \"\")\n    new_release_lines = [\n        \",\" \"<!-- unreleased-content-marker -->\",\n        \"\",\n        \"- Linter versions upgrades\",\n"
                  },
                  "endLine": 2255,
                  "startLine": 2241
                }
              }
            }
          ],
          "properties": {
            "issue_confidence": "HIGH",
            "issue_severity": "HIGH"
          },
          "ruleId": "B602",
          "ruleIndex": 2
        },
        {
          "message": {
            "text": "Try, Except, Pass detected."
          },
          "level": "note",
          "locations": [
            {
              "physicalLocation": {
                "region": {
                  "snippet": {
                    "text": "except:\n"
                  },
                  "startLine": 3
                },
                "artifactLocation": {
                  "uri": "file:///github/workspace/python_bad_1.py"
                },
                "contextRegion": {
                  "snippet": {
                    "text": "    pass\nexcept:\n    pass\n"
                  },
                  "endLine": 4,
                  "startLine": 2
                }
              }
            }
          ],
          "properties": {
            "issue_confidence": "HIGH",
            "issue_severity": "LOW"
          },
          "ruleId": "B110",
          "ruleIndex": 4
        }
      ]
    }
  ],
  "version": "2.1.0",
  "$schema": "https://schemastore.azurewebsites.net/schemas/json/sarif-2.1.0-rtm.4.json"
}
pennyarcade commented 1 year ago

+1 on this.

Maybe one could build on this template: https://github.com/jbelien/phpstan-sarif-formatter

jrfnl commented 1 year ago

To be honest, as PHPCS allows for using external reports, in my opinion this is more suited to an external repo providing that report format as I don't see enough demand for this format to justify maintaining it within PHPCS itself.

XVilka commented 1 year ago

It's necessary for GitLab adopting SARIF: https://gitlab.com/gitlab-org/gitlab/-/issues/118496

jrfnl commented 1 year ago

@XVilka Good to know, but that doesn't negate the possibility to handle this via a PHPCS addon which provides the report format.

soderlind commented 5 months ago

CodeQL, as in GitHub Advanced Security, also like SARIF

soderlind commented 5 months ago

Typical, found an alternative using cs2pr

- name: Setup PHP
  uses: shivammathur/setup-php@v2
  with:
    php-version: '8.3'
    tools: cs2pr, phpcs

- name: Run phpcs
  run: phpcs -q --report=checkstyle src | cs2pr 
llaville commented 2 months ago

@nvuillam I've a good new for you. As recommanded by @jrfnl, I've started to write a single repository with only the SARIF report support for PHP CodeSniffer 3.3 or higher.

As I did in past with PHPStan (see my old request : https://github.com/phpstan/phpstan/issues/5973) that was not accepted and lead to an alternative solution https://github.com/jbelien/phpstan-sarif-formatter, my PHP_CodeSniffer SARIF support is based on my package (https://github.com/llaville/sarif-php-sdk) that implement the full specification of SARIF 2.1.0

My first attempt is already pretty good, but I need more days to polish code before to propose a new package to Community.

nvuillam commented 2 months ago

This will be a great addition yo MegaLinter :)

llaville commented 2 months ago

@nvuillam Integration with MegaLinter will continue on its own thread (just opened 3515)

llaville commented 2 months ago

For all, the solution exists now with https://github.com/llaville/sarif-php-sdk/releases/tag/1.2.0 Demo and explains may be found on https://github.com/llaville/sarif-php-sdk/tree/1.2/examples/converters/phpcs