oss-review-toolkit / ort

A suite of tools to automate software compliance checks.
https://oss-review-toolkit.org
Apache License 2.0
1.6k stars 309 forks source link

FossID API for matched line needs to be improved. #7028

Open nnobelis opened 1 year ago

nnobelis commented 1 year ago

This ticket is following a discussion at https://github.com/oss-review-toolkit/ort/pull/7022#discussion_r1200165856.

When listing the snippets through FossID API, FossID return each snippet with such a payload (in the highlighting property) :

{
  "id": "410668b1f35f8b27ff9ce345998448b6",
  "local_coverage": 0.9754,
  "local_highlight": {
    "blocks": [
      {
        "byte_range": {
          "begin": 0,
          "end": 712
        },
        "char_range": {
          "begin": 0,
          "end": 712
        },
        "id": "abdc2b929a1b84f24155c27b752944ab"
      },
      {
        "byte_range": {
          "begin": 1395,
          "end": 28013
        },
        "char_range": {
          "begin": 1395,
          "end": 28013
        },
        "id": "a11cb7ad7af8d915193131a92d514ed7"
      }
    ],
    "encoding": "UTF-8",
    "id": "3673e848c2d349e2f054691c952b3f2f",
    "pfm_format": 2
  },
  "local_size": 475,
  "remote_coverage": 1,
  "remote_highlight": {
    "blocks": [
      {
        "byte_range": {
          "begin": 0,
          "end": 27333
        },
        "char_range": {
          "begin": 0,
          "end": 27333
        },
        "id": "871d76314c0e746c1b33d63e6c05a909"
      }
    ],
    "encoding": "UTF-8",
    "id": "410668b1f35f8b27ff9ce345998448b6",
    "pfm_format": 2
  },
  "remote_size": 475
}

This should allow to get the matched lines between the source file and the snippet. Unfortunately, this is only character range information, not line range. To get the matched lines, one has to call files_and_folders/get_matched_lines with the source file name and the snippet id. Then FossID returns the matched lines equivalent.

Indeed, the FossID API is designed in such a way that, getting the matched lines of a snippet requires a separate query to the API server.

Therefore the workflow is :

These are way to much requests as we have scans with 2000 pending files! For such scans, we need more than 10000 requests to fetch all snippets data (snippet + matching lines).

FossID should provide an API to batch these operations. For instance:

Note: the fossid-cli proprietary tool seems to have a better performance for this, with the --sensitivity option. Is there an unofficial API to perform what we want to do ?

For what it's worth, the ticket FOSSIDSC-3099 has been opened at FossID support (access requires account).

nnobelis commented 1 year ago

We received an answer from FossID:

We are grateful for this input, it has become a roadmap candidate for our roadmap planning.

sschuberth commented 2 weeks ago

@nnobelis are you aware of any updates on the FOSSID side?

nnobelis commented 2 weeks ago

No, none at all :(