scanoss / engine

SCANOSS Open Source Inventory Engine
GNU General Public License v2.0
34 stars 20 forks source link

file-based matches appear to prefer 3rd-party github origin over authoritative nuget repo #70

Open wildejoe opened 1 week ago

wildejoe commented 1 week ago

This may be operator error, or may be an opportunity for matching algorithm improvement? In testing SCANOSS Workbench (1.12.3) as well as scanoss.js & scanoss.py (multiple recent versions eg via docker ghcr.io/scanoss/scanoss-py:latest), primarily with projects using .net core & framework libs, I found scanoss 'found' 100% match on common .net lib files which various 3rd-parties (possibly inadvertently) included in their github project releases rather than matching on the authoritative .net package - often with src on github, but binary first published on nuget. Examples: Azure.Security.KeyVault.Secrets.dll - from Azure .NET SDK v4.1.0 / file v4.100.20.41103 / md5:5a048b36e0402c521fe93272a2f9aab2 / sha256:0fcdcd4b78955ffcbfa34ef181322c1fcfd983ce8612c335f10dced61452213e Src published 8/11/202: https://github.com/Azure/azure-sdk-for-net/releases/tag/Azure.Security.KeyVault.Secrets_4.1.0 Bin published 8/12/202: https://www.nuget.org/packages/Azure.Security.KeyVault.Secrets/4.1.0 3rd-party "matched" pkg published 11/23/2020: https://github.com/folkehelseinstituttet/Fhi.Smittestopp.Verification/releases/tag/v0.1.0-alpha Microsoft.Extensions.Primitives.dll - from MS .NET v8.0.0 / file v8.0.23.53103 / md5:bb3af05bc071ceba027d1dc5fa255ec4 / sha256:f9f47907a041067d9bc1b9a9f516009b6ca0a9c6325ce83527883cc0d053dfa5 Src published 11/14/2023: https://github.com/dotnet/core/releases/tag/v8.0.0 Bin published 11/14/2023@13:23z: https://www.nuget.org/packages/Microsoft.Extensions.Primitives/8.0.0 3rd-party "matched" pkg published 11/14/2023@19:00z: https://github.com/danielpalme/ReportGenerator/releases/tag/v5.2.0

This may be a byproduct of how MS proj maintainers have handled src releases on github and bin releases via nuget, but when a dozen or more files on a project have this occur, it consumes notable extra time to straighten out the sboms and ensure lic/cve issues are recognized and addressed. If this isn't something scanoss engine changes can address, any suggestion as to how to minimize having to redo the straightening on scans for subsequent project builds and/or on other projects using the same lib would be helpful - or suggestions on where to ask upstream or downstream of the scanoss engine project. I'm sure others may encounter as the scanoss user-base grows, thanks.

mscasso-scanoss commented 1 week ago

Hi @wildejoe, thank you very much for your message. We are aware of these issues. The most effective solution to resolve them is by utilizing the "identify" option in our clients. This option allows users to send package PURLs as hints to the engine, enabling specific components to be prioritized.

For the example you provided, here is a potential solution:

scanoss-py scan -i az_sbom.json project_path

Where az_sbom.json contains the following content:

{
  "components": [
    {
      "purl": "pkg:nuget/Microsoft.ServiceFabric.CollectSFData"
    }
  ]
}

During the scanning process, the engine will prioritize this purl (or purls) over other potential matches, if more than one option is available. Please let me know if this helps, and feel free to close the issue if you believe this is the solution you were looking for.