Provide necessary support for SpotBugs SARIF

michaelcfanning commented 4 years ago

e.g., can we help provide a schema-driven Java OM?

https://github.com/spotbugs/discuss/issues/95

KengoTODA commented 4 years ago

Biggest problem is that our descriptor is written in HTML like this so we cannot provide it in TEXT and MD.

It's also nice if we can generate necessary classes/interfaces from JSON schema. But it's not mandatory; we can code classes manually.

KengoTODA commented 4 years ago

Biggest problem is that our descriptor is written in HTML like this so we cannot provide it in TEXT and MD.

I remember that markdown can contain HTML elements, I'll directly put our descriptors into text.markdown property as an experiment.

KengoTODA commented 4 years ago

You can find the .sarif report generated by the latest SpotBugs (not released yet) at here: https://github.com/spotbugs/spotbugs/pull/1184#issuecomment-652959161

When you find problems, please come https://github.com/spotbugs/spotbugs/pull/1185 to comment, or mention me at here. Thanks!

KengoTODA commented 4 years ago

I remember that markdown can contain HTML elements, I'll directly put our descriptors into text.markdown property as an experiment.

I want to confirm this really work or not. However, even when I put fullDescription.markdown property, the SARIF viewer plugin for VS Code doesn't display it. How I can verify that the current JSON file is valid or not?

Here is viewers' screenshot from my local:

Here is the JSON data in tool.driver.rules:

{"messageStrings":{"default":{"text":"{0} is Serializable; consider declaring a serialVersionUID"}},"id":"SE_NO_SERIALVERSIONID","shortDescription":{"text":"Class is Serializable, but doesn't define serialVersionUID"},"fullDescription":{"markdown":"\n\n  <p> This class implements the <code>Serializable<\/code> interface, but does\n  not define a <code>serialVersionUID<\/code> field.&nbsp;\n  A change as simple as adding a reference to a .class object\n    will add synthetic fields to the class,\n   which will unfortunately change the implicit\n   serialVersionUID (e.g., adding a reference to <code>String.class<\/code>\n   will generate a static field <code>class$java$lang$String<\/code>).\n   Also, different source code to bytecode compilers may use different\n   naming conventions for synthetic variables generated for\n   references to class objects or inner classes.\n   To ensure interoperability of Serializable across versions,\n   consider adding an explicit serialVersionUID.<\/p>\n\n    "},"helpUri":"https://spotbugs.readthedocs.io/en/latest/bugDescriptions.html#SE_NO_SERIALVERSIONID","properties":{"tags":["BAD_PRACTICE"]}}

Or you may check the while main.sarif.txt. Thanks in advance!

KengoTODA commented 4 years ago

What is the difference between tool.driver.notifications and invocations.toolConfigurationNotifications? I'm using driver.notifications to put errors and exceptions like below, is it OK?

{
  "version":"2.1.0",
  "$schema":"https://schemastore.azurewebsites.net/schemas/json/sarif-2.1.0-rtm.4.json",
  "runs":[
    {
      "tool":{
        "driver":{
          "name":"SpotBugs",
          "version":"Development",
          "language":"en",
          "notifications":[
            {
              "level":"error",
              "descriptor":{
                "id":"spotbugs-missing-classes"
              },
              "message":{
                "text":"Classes needed for analysis were missing: [com.github.spotbugs.MissingClass]"
              }
            }
          ]
        }
      }
    }
  ]
}

michaelcfanning commented 4 years ago

tool.driver.notifications are a set of reporting descriptors that provide metadata on the notifications a component might raise.

invocations.toolConfigurationNotifications and invocations.toolExecutionNotifications are instances of those notifications that actually occurred at runtime. We call the analysis equivalents rules and results. It's confusing here because we use the same term for notification metadata as well as the notifications.

toolConfigurationNotifications are errors that resolve to some sort of configuration problem. That is, they reflect a problem that can be resolved by the user properly configuring the tool (in your example, by ensuring that all classes relevant to analysis are present. toolExecutionNotifications are errors in the engine itself (i.e., bugs that need to be addressed by the tool developer). You'd use this property to store an unhandled exception or other engine correctness problem.

KengoTODA commented 4 years ago

OK thanks. I'm still not sure when I need to use tool.driver.notifications, I know that toolConfigurationNotifications is better place to put information about missing classes.

michaelcfanning commented 4 years ago

When you want to report an actual notification that occurred to a user, such as when classes are missing, use toolConfigurationNotifications to send those messages. tool.driver.notifications is a place to add metadata about each class of notification, for example, this is where you would store a full description of the notification.

@lgolding, you are an effective explainer of the standard, want to jump in here? :)

ghost commented 4 years ago

@KengoTODA, @michaelcfanning is correct (except that, looking at your file, I see that you have toolExecutionNotifications, not toolConfigurationNotifications.)

The SARIF Tutorials has a section on Notifications that might help. Here is an example (which I will add to the tutorial):

{
  "version": "2.1.0",
  "runs": [
    {                                          // A run object.
      "tool": {
        ...
      },
      "results": [
        ...
      ],
      "invocations": [
        {                                      // An invocation object
          "toolExecutionNotifications": [
            {                                  // A notification object.
              "level": "error",
              "descriptor": {
                "id": "spotbugs-missing-classes"
              },
              "message": {
                "text": "Classes needed for analysis were missing: [apply, iterator, ...]"
              }
            },
          ]
          "toolConfigurationNotifications": [
            {                                  // A notification object.
              ...
            },
          ]
        }
      ]
    }
  ]
}

Now, the SARIF toolComponent object does have a notification property, but it doesn't hold notifications: it holds descriptions of (metadata about) notifications. So if, for example, you wanted to provide a full description of what the notification with id NOT10001 meant, you could put it in tool.driver.notifications.

The property tool.driver.notifications used to be called ...notificationDescriptors. Towards the end of the standardization process, we went through an exercise of shortening many of the property names. In this case, it seems that the shortened name caused confusion.

ghost commented 4 years ago

@KengoTODA I added a Notifications sample to the SARIF Tutorials. I hope you find it helpful.

I'm going through your sample SARIF output in detail and will provide more feedback later today. Thank you for doing this!

KengoTODA commented 4 years ago

Thanks. This is the latest example spotbugs.sarif, that was made for this PR. At least it passed the SARIF validator.

I will check your comments carefully, to try to understand which property I should use for this case.

ghost commented 4 years ago

Kengo, thank you for the latest spotbugs.sarif. It saved me a lot of work :-)

This file is almost valid according to the schema. The only error is that the location.logicalLocation property is actually named logicalLocations (plural). Note that its value is an array.

The spec (§3.28.4) explains the reason for this:

NOTE: logicalLocations is an array because some logical locations can be expressed in more than one way. For example, the logical location of an element in an HTML document might be expressed by an XML Path expression such as /html/body/img[1] or by a CSS selector such as #logo.

After I fixed that, the latest version of the SARIF validator tool (which we will publish very soon) finds only two problems:

The $schema property should have the value https://schemastore.azurewebsites.net/schemas/json/sarif-2.1.0-rtm.5.json, which is the final official version. The only difference between -rtm.4 and -rtm.5 was to correct a typo in the name of one property that you are not using.
Rule messages should be expressed in complete sentences, so they should end with a period (for example, "{0} may expose internal representation by returning {1}."

I will now look by hand to see if I can make any more recommendations.

ghost commented 4 years ago

Hi @KengoTODA, here are some more comments on the SpotBugs SARIF output.

NOTE: Many of the things that I mention here, we will enhance our validator to report automatically.

Do you plan to include codeFlows in your output? spotbugs/discuss#69 mentioned potential improvements to SpotBugs output related to code flows, and SARIF's code flow feature would make it easier to do.
In your previous sample output, you included tool notifications, including exception objects. That was very helpful. Do you plan to put them back?
Your logicalLocations.fullyQualifiedName (for example, edu/umd/cs/findbugs/workflow/RejarClassesForAnalysis.java:[line 220]) is actually a physical location: a file name plus a line number. You already have that information in physicalLocation. The spec defines "logical location" like this:

logical location

location specified by reference to a programmatic construct, without specifying the artifact within which that construct occurs

Example: A class name, a method name, a namespace.

The idea of logicalLocation.fullyQualifiedName is to have (for example) a namespace + a class + a method name, like System.Collections.Generic.Dictionary.Keys. In that case, you might use Keys as the name property, and the whole string as the fullyQualifiedName property.
The spec describes a rule id as a "stable", "opaque" identifier (see §3.49.3):

In the case of a rule, id SHALL contain a stable identifier for the rule and SHOULD be opaque.

By "opaque" the spec means a purely symbolic name, like SPOT1001. We suggest you assign ids such as these to your rules, and place your existing human-readable rule names in the name property:
```
"tool": {
    "driver": {
        ...
        "rules": [
            {
                "id": "SPOT1001",
                "name": "EI_EXPOSE_REP",
                ...
            }
        ]
    }
}
```
The property names in originalUriBaseIds are numbers. The spec (§3.3.4) does say that this is legal:

The uriBaseId property can be any string; it does not need to have any particular syntax or follow any particular naming convention.

However, it is helpful to use a symbolic name that is meaningful to the reader. In examples, we frequently use names like SOURCE_ROOT, BIN_ROOT, or REPO_ROOT.
Since tool.driver.notifications is an empty array, you can omit it.
In a region object, if endLine is the same as startLine, you don't have to mention endLine (it defaults to startLine).
We recommend enclosing "dynamic content" (the placeholders like {0} in your message strings) in single quotes. The quotes make the variable content easier to spot, and we've found single quotes give a nice clean appearance. To be clear, the spec does not require this, but we did add a rule to our SARIF validator tool to recommend it.

ghost commented 4 years ago

@KengoTODA, I want to say a little more about SARIF "opaque" rule ids. When the spec says that a rule id must be "opaque", it does not mean that it must be incomprehensible to human readers, like "SPOT1001". It just means that a generic SARIF consumer -- that is, a consumer with no special knowledge of the tool -- is not allowed to assume anything about the internal structure of the identifier. For example, a generic SARIF consumer is not allowed to assume that because the rule id starts with "EI_", it has to do with "exposing internal representation" (as I see from the documentation of EI_EXPOSE_REP, EI_EXPOSE_REP2, etc.)

SARIF opaque rule identifiers should be short (to take up less UI space) and search-engine-friendly (to make it easy for users to learn more about the rule). Your strings like EI_EXPOSE_REP are fine in those regards. I should not have encouraged you to invent additional "purely symbolic" identifiers like "SPOT1001".

If your rules have additional, human-readable names, you could put them in the rule's name property, for example:

"rules": [
  {
    "id": "EI_EXPOSE_REP",
    "name": "ExposesInternalRepresentation",

@michaelcfanning FYI

ghost commented 4 years ago

@KengoTODA, we have now published the latest SARIF validator, which you can use to assess your SARIF output. To use the validator:

Install the .NET Core SDK 3.1: https://dotnet.microsoft.com/download/dotnet-core/3.1

This adds the .NET Core command line tool dotnet to your path.
In a new terminal window, run the command:
```
dotnet tool install --global Sarif.Multitool
```
This adds the SARIF Multitool command line tool sarif to your path.
In a new terminal window, run the validate command on your SpotBugs SARIF output:
```
sarif validate SpotBugs.sarif
```
This produces a list of results on the console. You can save those results as a SARIF log file:
```
sarif validate SpotBugs.sarif --output SpotBugs-validation.sarif
```

The SARIF validator checks for:

JSON syntax errors.
Violations of the SARIF schema.
Violations of certain conditions that can't be expressed in the schema, including:
- Actual spec requirements.
- Recommended best practices.

We continue to add new rules to the validator and to refine existing rules. To get the latest version:

dotnet tool update --global Sarif.Multitool

Hope this helps!

KengoTODA commented 4 years ago

Thanks, I've tried it. It seems that it needs not SDK 3.1 but SDK 2.1, so I installed it from https://dotnet.microsoft.com/download/dotnet-core/2.1

Here is the log which was reported when I run sarif tool with SDK 3.1:

$ sarif validate Desktop/spotbugs.sarif                                                                                                    
It was not possible to find any compatible framework version
The framework 'Microsoft.NETCore.App', version '2.1.0' was not found.
  - The following frameworks were found:
      3.1.6 at [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]

You can resolve the problem by installing the specified framework and/or SDK.

The specified framework can be found at:
  - https://aka.ms/dotnet-core-applaunch?framework=Microsoft.NETCore.App&framework_version=2.1.0&arch=x64&rid=osx.10.15-x64

KengoTODA commented 4 years ago

Sorry for my lazy reply, I'll check your comments one by one.

Do you plan to include codeFlows in your output? spotbugs/discuss#69 mentioned potential improvements to SpotBugs output related to code flows, and SARIF's code flow feature would make it easier to do.

Issued as https://github.com/spotbugs/spotbugs/issues/1241

In your previous sample output, you included tool notifications, including exception objects. That was very helpful. Do you plan to put them back?

It should exist even in the current version (SpotBugs 4.1.1). I need to improve the sample output to cover features in the format.

KengoTODA commented 4 years ago

Trying to generate POJO from JSON schema. It marks fields in MessageStrings object 'additional properties' so the generated JSON file keeps nothing. So we cannot keep the message template with placeholder explained as 3.49.11. Here is generated Java code: https://gist.github.com/KengoTODA/357e483984c8ed5442da7ed4b7331907

Still not sure where this problem comes from.

eddynaka commented 3 years ago

@michaelcfanning , i think we can close this, since we were able to solve the issues in the sarif version from spotbugs. What do you think?

KengoTODA commented 3 years ago

@eddynaka in my case, in which project I should make an issue? is it https://github.com/oasis-tcs/sarif-spec/ ?

eddynaka commented 3 years ago

@KengoTODA I thought this was the issue that we closed in spotbugs

microsoft / sarif-sdk

Provide necessary support for SpotBugs SARIF #1952