mercedes-benz / sechub

SecHub provides a central API to test software with different security tools.
https://mercedes-benz.github.io/sechub/
MIT License
260 stars 63 forks source link

[Sechub Client] Hints on misspelled words in the configuration #1310

Open Jeeppler opened 2 years ago

Jeeppler commented 2 years ago

Problem

With more and more scan types and tools added the configuration, becomes slightly more complex. A configuration like this:

{
  "apiVersion" : "1.0",
  "data" : {
    "sources" : [ {
      "name" : "code",
      "fileSystem" : {
        "folders" : [ "myProject/source" ]
      }
    } ]
  },
  "licenseScan" : {
    "use" : [ "code"]
  }
}

contains several words which can be misspelled by accident. Words with potential of getting misspelled are "sources", "apiVersion", "fileSystem", "folders", "licenseScan" and "use". For example, a user wrote "source" without an s or flesystem without an i and a lower case s in the configuration. One single misspelled word will lead to errors from the server. The server parses the configuration and is very strict, as it should be.

Solution

Instead of making the server more forgiving or less strict, the client should help the user to create a well-formed configuration file. The client should point out mistakes and provide hints to the user. For example, let's assume the user forgets to write an s at the end and writes instead source. The client should tell user that something is wrong and what the user could have meant. For example, Unknown element source. Did you mean sources?.

This could be achieved is, by having a list of all allowed key words (or a JSON schema) and calculating to which keyword the misspelled word of the user is most closely related. It might be possible to calculate the edit distance of the keyword and misspelled word. The most famous edit distance is probably the Levenshtein distance.

sven-dmlr commented 1 year ago

I won't step into the complex area of spell correction.

But we could add functionality to the client like generating an initial sechub.json. I can imagine that this could be helpful. You then only have to edit the folder and then you are done.

Jeeppler commented 1 year ago

To help with the generation of an initial sechub.json is a good idea. However, one should create a separate issue for that.

Regarding the issue, I did not mean to really do spell checking like a word processor does.

My idea is, create a JSON Schema of our config file. Once we have a JSON Schema, we can validate the config file from the user against the schema before the user sends it to the server.

There are several tools written in Go which can do that: https://json-schema.org/implementations.html#validator-go.

For many keys in the config file there is only one way to write them or in case of enums only a very limited amount of possibilities. Let's assume somebody writes source instead of sources, a custom validator could check, whether source is closer to sources or binaries (e.g. using edit distance), the only two possible options. The validator can tell the user, that source should probably be sources.

Something like:

L34: You wrote `source` instead of `sources`. Did you mean `sources`?

In summary, the main value would come from having a JSON Schema of the SecHub config for the SecHub Client to validate against, that can prevent most errors before sending it to the server. In addition, it would be nice to provide additional support in the error messages to the user. Basically, to help the user to figure out issues in the config even faster.