About this issue

This is a concept issue, were we gather thoughts before final documented / implemented.

If suitable the markdown can be converted to asciidoc and used directly (or for the beginning) when it comes to final documentation.
(This can be done automated, e.g. eclipse asciidoc editor supports a converter action for markdown->asciidoc ).

It shall be also some kind of epic, so we reference in related issues.

Analytics

Does analyze before the "real" scan and collects meta information which will be available for "normal" scan phases later and also for post processing.

Features/Capabilities

Every analyzer will provide some kind of "features" or "capabilities". For example:

     ... able to count lines of code (LOC) ...
     ... able to inspect used kind of languages inside given sources...
     ... able to inspect used kind of languages inside binaries...
     ... able to inspect comments and look for false-positive markers...
     ... more...

It shall be possible to call an analyzer by PDS . Which kind of feature is supported by the analyzer is defined in the output result which is JSON.

JSON format for feature based meta data

Every analytics scan will return a JSON - the exact format needs still to be defined, this is just a draft/example/idea:

{
    "analyze.feature.loc" : {
           "status" : {
                "exitCode" : 0
           },
            "all" : {
                "loc" : 25233
            },
            "language-specific": [
                         { "language" : "java" , "loc" : 24994},
                         { "language" : "bash" , "loc" : 241},
                         { "language" : "groovy" , "loc" : 1341}
            ]
      },
      "analyze.feature.languagedetection" : {
                "status" : {
                     "exitCode" : 0
               },
              "languages" : [
                  {
                      "language" : "java",
                      "description" : "Java source files",
                      "fileEndings" : [".java"]
                  },
                  {
                      "language" : "bash",
                      "description" : "Bash scripts",
                      "fileEndings" : [".sh",""]
                  }
                ]
         },

      "analyzer.feature.falsepositive.markers" : {
          "status" : {
                "exitCode" : 0,
                "message" : "Did not found any markers"
           },
      }
}

When nothing found, but feature supported at least the status part will be returned with exitCode = 0. A message is optional.

Metadata

The metadata is the former mentioned JSON. When there are multiple analyzers the results must be merged (if possible, duplicates shall be removed) . It will be provided to the further process to product executors.

Storage of collected meta data + post processing

We have different stages for meta data

on scan
on project
global

it would be nice to collect the information inside application in extra database tables

on scan:
we will store the metadata json already as a product result (as usual for any executed product). The merged result information is normally only interesting at runtime, so there should be no additional storage necessary
on project:
a preprocessor step should handle project specific cumulation
on global:
a preprocessor step should handle global specific cumulation

post/pre-processing

We must decide if we do processing directly after the metadata gathering or after doing the scans.

After the scans would also have the possiiblity to count failing tests etc. So maybe we should do it after the scans done and report was written..

postprocessing project cumulation

amount of scans per project
languages (set)
amount of failed scans, done etc.

postprocessing global cumulation

amount of scans at all
languages (set)
amount of failed scans, done etc.

Validation

After analyzing phase we should validate/assert that the mandatory meta data (e.g. LOC, languages? TBD) was fetched.

Integration inside execution profiles

As for any other existing scan type (CodeScan, WebScan,InfraScan and also Reporting) it will be possible to configure the corresponding Analytics product executors inside an execution profile.

For Report scan type (which does gather/collect reports into Sereco format) we have a fallback/default implementation when nothing is configured (Sereco Reporting) - but for this kind of scan type there is only ONE implementaion configurable at all (or at least only the first will be used). The fallback was only possible because Sereco Reporting is always available - reason : it's embedded inside sechub-server.

For new Analyticsscan type, we will have no (real) embedded implementation inside sechub server itself, so a fallback will not be supported out-of-the-box or the fallback will always return fallback values only (e.g. LOC is always 0...) . But maybe we could provide at least some suggestions, default executor configurations?

The aim should be

make analytics product execution configurable, so modular
validate that minimum of features will be supported
otherwise the product execution will fail because not enough information for execution
easy to configure

Reuse analyzer execution profile

Admins should NOT get into a configuration hell when using different execution profiles. Maybe it would become cumbersome to always add e.g CLOC analytics and more into every execution profile etc. But on the other hand a special profile could have an other configuration, using another analyzer product (e.g. for testing , better performance handling for specific languages etc.).

So, one option could be to provide default-analyzer-profile a global product execution profile which is always available and cannot be deleted.

Further thoughts about the concept

Execution phase

The Analytics scan type will provide some interface of features which will provided as usual by adapters. So we can use a bunch of those adapters.

Future possibilities

These type of adapters will be always executed before the other types, so it will be possible in future to make decisions of further execution handling

Situation specific usage of configured product executors

When a product executor does introspect the given analytics meta data it could decide further steps/treatment.

For example: when a CodeScan product executor is configured inside executed execution profile, but the adapter can only provide scanning for go language but there are no go files inside the scanned sources the product executor/adapter should not be started at all.

Integrate `sechub-analyzer-cli`

We have already a gradle sub module sechub-analyzer-cli. Integrating this into analyzer phase we are able to support false positive handling by comments.

mercedes-benz / sechub

Concept `Analytics` #684

About this issue

Analytics

Features/Capabilities

JSON format for feature based meta data

Metadata

Storage of collected meta data + post processing

post/pre-processing

postprocessing project cumulation

postprocessing global cumulation

Validation

Integration inside execution profiles

Reuse analyzer execution profile

Further thoughts about the concept

Execution phase

Future possibilities

Situation specific usage of configured product executors

Integrate `sechub-analyzer-cli`

mercedes-benz / sechub

Concept `Analytics` #684

About this issue

Analytics

Features/Capabilities

JSON format for feature based meta data

Metadata

Storage of collected meta data + post processing

post/pre-processing

postprocessing project cumulation

postprocessing global cumulation

Validation

Integration inside execution profiles

Reuse analyzer execution profile

Further thoughts about the concept

Execution phase

Future possibilities

Situation specific usage of configured product executors

Integrate sechub-analyzer-cli

Integrate `sechub-analyzer-cli`