ossf / security-insights-spec

OPENSSF SECURITY INSIGHTS: Repository for development of the draft standard, where requests for modification should be made via Github Issues.
Other
49 stars 10 forks source link

`third-party-packages` will never be false #18

Closed JLLeitschuh closed 1 year ago

JLLeitschuh commented 1 year ago

Depending upon your definition of "dependency", you'll always depend upon third-party-packages, thus dependencies-lists will always be required.

Writing a python package, you depend upon the python stdlib. Writing a c library, you depend upon the c stdlib.

https://github.com/ossf/security-insights-spec/blob/4e783e0397b2c4bbf3a35144e985dde8f8b9f93e/security-insights-schema-1.0.0.yaml#L474-L528

The problem with requiring the list of dependencies is many tools don't list the full dependency list, just explicit dependencies. Let's take Gradle, as an example. You can explicitly declare dependencies, but dependencies can also come from plugins that bring in external dependencies. You can also have transitive dependencies. Usually repositories don't have a central location listing all of these. It's impractical to require that such a list be kept up-to-date per commit.

I think that, to be adopted, the constraints/expectations on dependencies needs to be reduced.

luigigubello commented 1 year ago

Thanks for this feedback @JLLeitschuh :) The idea would be to make it easier for developers, contributors, or other people to find the dependency files - that have different names according to the ecosystem (and the same programming language can have multiple dependency file formats). In projects "big enough", where there are multiple languages and folders, it may be good to have a sort of index that points to the dependency files. Do you think this section should be optional?

JLLeitschuh commented 1 year ago

The idea would be to make it easier for developers, contributors, or other people to find the dependency files - that have different names according to the ecosystem (and the same programming language can have multiple dependency file formats).

Yes, but each ecosystem has a "general" norm, right? Maven = pom.xml, Gradle = build.gradle[.kts].

Are you wanting a file with all dependencies, including transitives, or just the direct dependencies? If you only want direct dependencies, then that should be explicitly stated IMHO.

JLLeitschuh commented 1 year ago

Do you think this section should be optional?

Probably? I also don't think that a field should be present, and certainly shouldn't be required without a clearly defined problem that it is solving. That problem being solved should be included in the documentation for the field.

As someone creating an instance of this document, it should be clear both how to construct each field, but also WHY I'm being asked to provide each bit of data, and what problem I'm solving by doing so.

In projects "big enough", where there are multiple languages and folders, it may be good to have a sort of index that points to the dependency files.

Can you elaborate on why this could be useful?

JLLeitschuh commented 1 year ago

I also want to keep in mind that, this security-insights-spec document will be much like documentation. And documentation is often out-of-date from the source it represents. Unless you also plan to add validation to ensure that a repositories structure is in-sync with it's declared security insights, it's going to end-up drifting from when the file is originally introduced.

If the file can be automatically generated from looking at the source tree, then why hard-code it into this file? Just write a tree walker to generate this data, and don't require a maintainer to keep the data in-sync within their repository.

luigigubello commented 1 year ago

Yes, but each ecosystem has a "general" norm, right? Maven = pom.xml, Gradle = build.gradle[.kts].

Yep, but it is easy to understand the programming languages of a project, but it might less easy to recognize the ecosystem, sometimes one programming language can have more ecosystem. Matrix can become complex quicky (e.g. GitHub Supported package ecosystems), my2c (but I don't have strong opinion about this).

Are you wanting a file with all dependencies, including transitives, or just the direct dependencies? If you only want direct dependencies, then that should be explicitly stated IMHO.

Good question, and hard problem to solve. Should we add a key-value? Something like dependency-level: [direct, transitive].

Can you elaborate on why this could be useful?

Improve automation, reduce false-negative, or errors. Scanners try to auto-find dependencies files, but sometimes they can fail (e.g. Snyk has a Q&A for this). Having a list of dependencies can help to improve automation, especially for big repo or monorepo. If I already have the dep files list, I can just parse this list and scan these files.

And documentation is often out-of-date from the source it represents.

100%. Do you suggest to make it optional?

JLLeitschuh commented 1 year ago

I can just parse this list and scan these files.

You can't. Most build tools require that they are executed to determine the resolved set of dependencies. Declare dependencies is most often not the same as the resolved dependencies.

Tools like Gradle operate as constraint solvers. Gradle takes a bunch of dependencies, and transitive dependencies, and all declared versions, and then figures out what version to actually resolve.

100%. Do you suggest to make it optional?

I propose that this spec be defined to do a few thing, and do those few things very well. If there is a desire and a use case to add this functionality, then we can discuss it and add it later. For now, I would suggest the removal of this field completely for now.

JLLeitschuh commented 1 year ago

I'd propose that the OSSF promote that more repository hosting platforms (like BitBucket, and GitLab) offer support for a dependency-graph API, similar to GitHub's.

We don't need to support it in a file, if it's already being captured and hosted as meta-data information on the repository already

https://docs.github.com/en/rest/dependency-graph/dependency-review?apiVersion=2022-11-28

luigigubello commented 1 year ago

I propose that this spec be defined to do a few thing, and do those few things very well. If there is a desire and a use case to add this functionality, then we can discuss it and add it later. For now, I would suggest the removal of this field completely for now.

You convinced me, but - just to be sure we are aligned - you are proposing to remove from SECURITY-INSIGHTS.yml third-party-packages (boolean) and dependencies-lists (array) from dependencies, right?

I would keep sbom (array) for a good reason: SBOM is still an evolving standard, important but still not so adopted, and apparently, there is still not a standard folder, URL, or just place where organizations store it. It can definitely help scanners and humans to find it.

JLLeitschuh commented 1 year ago

You convinced me, but - just to be sure we are aligned - you are proposing to remove from SECURITY-INSIGHTS.yml third-party-packages (boolean) and dependencies-lists (array) from dependencies, right?

Yes. That is what I'm proposing.

I would keep sbom (array) for a good reason: SBOM is still an evolving standard, important but still not so adopted, and apparently, there is still not a standard folder, URL, or just place where organizations store it. It can definitely help scanners and humans to find it.

What value would you expect for the sbom parameter? As a challenge, I'd suggest you find a real-world use case of what you would set that value to for a real repository. That way you can determine whether or not it's actually possible for others to do the same.

luigigubello commented 1 year ago

What value would you expect for the sbom parameter?

Find it. As far as I know, there is still no standard to serve SBOM files. Searching on Google, I find these results:

"Finally, you can use the SBOM for your needs - sign it, serve your customers, check for vulnerabilities and licenses, etc." - legitsecurity.com

"An SBOM document should be generated in a timely manner after changes to software, and it should be stored in a place where anyone who needs to read it has permission to do so." - apiiro.com

"The GitLab DevSecOps platform is comprehensive as it provides Dependency SBOM and Container SBOM insights." - gitlab.com

"A BOM repository server for distributing CycloneDX BOMs." - CycloneDX/cyclonedx-bom-repo-server

"Store the SBOMs in your Nexus Repository (or other package managers)" - sonatype.com

If you store the link, URL, or source of the SBOM file in SECURITY-INSIGHTS.yml every scanner or person can easily find it. Can this be a real-world use case?

JLLeitschuh commented 1 year ago

Are there any projects publishing SBOMs in a consistent location currently?

If they are build artifacts, then the URL will be versioned, meaning that direct links to the SBOM will be impossible, unless you expect the maintainer to update this document every time they publish a new version of their library.

Do you have any real-world examples of actual URLs that could be used to fill in that value today? Not just documentation about SBOMs, but actual projects that are publishing them?

JLLeitschuh commented 1 year ago

Let's take as an example the following PURL: pkg:maven/org.apache.commons/commons-lang3@3.12.0

They publish their artifacts at the following url:

What value would you expect an end-user to specify for where they host their SBOMs?

  1. https://repo.maven.apache.org/maven2/org/apache/commons/commons-lang3/3.9/commons-lang3-3.9-SBOM.spdx.yaml
  2. https://repo.maven.apache.org/maven2/org/apache/commons/commons-lang3/3.9/
  3. https://repo.maven.apache.org/maven2/org/apache/commons/commons-lang3

If you say 1 or 2, then you require that the maintainer update their security-insight-soec file for every release. If you say 3, then' it's some sub-path under that domain, meaning that tools still have to search sub-paths for the SBOM.

Still for 3: AFAIK, there is no defined "search algorithm" for finding SBOM documents under a given path. Thus, tooling will still need to search for the SBOMs.

It looks like, given the current specification, you're looking for users to supply 1:

https://github.com/ossf/security-insights-spec/blob/4e783e0397b2c4bbf3a35144e985dde8f8b9f93e/security-insights-schema-1.0.0.yaml#L493C13-L518

However, this adds significant overhead for maintainers to have to keep this file up-to-date on every release of their software, adding yet another thing they need to remember to update. Or requiring them to write tooling to keep this file up-to-date during or after every release.

I would suggest removing the sbom field as well, until there is a better a standardized publishing location for SBOMs

luigigubello commented 1 year ago

Yes, I see your point. And if do we leave it so we might use SECURITY-INSIGHTS.yml to collect data on where or how maintainers store or share the SBOM file? Just an idea. Otherwise, we can just temporarily remove it from the schema :)

JLLeitschuh commented 1 year ago

I'd suggest removing it until the industry centralizes around at least a semi-unified way of publishing SBOMS

eddie-knight commented 1 year ago

I believe some of the concepts from this discussion have been applied in recent changes, but some things may have been missed.

If there are any specific suggestions for consideration in a future release, please bring this up in a community meeting or open a narrowly scoped issue that references this discussion.