Open merks opened 1 year ago
I guess that I should make the Dash License Tool's core library an OSGi bundle. I should probably do that anyway (and factoring out the CLI stuff is overdue). FWIW, SPDX doesn't provide license information about content, they're more about defining how to specify license information. The Dash License tool already knows how handle our rules with regard to ClearlyDefined and the IPLab database.
At least internally, we've latched onto CycloneDX. My recommendation is that we focus on that, leaving SPDX SBOM generation as a nice-to-have for later consideration.
I don't have an opinion regarding whether or not content should be added to Orbit. If you feel that it should be, then we can try to find some resources to help.
I did some prototyping how one might generate an SBOM from a p2 repository
What we'd like to produce first is an SBOM for an Eclipse IDE. I'm quite disconnected from the p2 technology... can we think of the configuration of a particular IDE product as a p2 repository? Or is this a different problem?
My sense is that one can produce some quite good information relatively quickly, but then one starts to slide down the slippery slope of complexity.
My thinking is that if we can produce something that quite good relatively quickly, we should do that. When we actually have something, it will be easier to get help (or allocate funds) to improve it.
Are you able to create "quite good information relatively quickly" within your current mandate?
The contribution of the needed library to Orbit is effectively already prepared and only needs to be committed to main.
I picked the Eclipse Installer repository as a small illustrative example. It is a product (like the Eclipse SDK and like each of the EPP packages). Its p2 repository is transitively complete with respect to its dependencies/requirements which is typically the case for product repositories as built by Tycho. Such a p2 repository is effectively an alternative representation of an actual product installation; in fact is it like the union of all the different OSes and architectures supported by the product. The current prototype can already be applied to any p2 repository; if it's not transitively complete with respect to requirements then there will be missing dependencies links, but all else would look the same...
Given there appears to be significant interest, I can spend some more time, as time permits, to flesh this out further. As 2023-12 draws to a close, there aren't currently so many spare cycles...
Hi, from my perspective, there are indeed two complexities:
The first one is to precisely determine the dependencies integrated into a generated product. This means identifying the components and versions precisely integrated into an Eclipse IDE bundle. Apparently, the proposed prototype is a very good starting point to extract this precise information (Is it possible to push the pom.xml and the necessary resources for the execution of this prototype? I would like to test it as part of internal experimentation).
The second complexity is to precisely extract the licenses of each of these components, and here, it's very complicated for two reasons: 1 - there is no standardization of PURL for P2 or OSGI packages. Therefore, it's nearly impossible to correlate a list of dependencies with a license database. 2 - There is no standardization or best practice for declaring a license in OSGI bundle manifests. There are various methods, but there isn't really a control metric or verification step for the correct presence of this license. Dash is a good solution for finding licenses for some OSGI bundles, but without PURL or without clear rules on what the GroupID and artifactID of a PURL should be, it limits the possibilities.
Would it not be possible to establish a control rule or best practice, already at the foundation's product level, to facilitate the adoption and standardization of this practice?
As exemple, the bundle org.eclipse.equinox.security.linux store his licence in the header of fragment.properties or in the about.html and the bundle org.eclipse.sdk store his licence in the header of other files.
Perhaps the right path initially is to require/encourage the use of a dedicated property in the manifest to input license information in a format understandable by license detection tools like Dash or SPDX.
It's on my TODO list to make this stuff available in a reusable form, i.e., in a form that one can build and then test it with an arbitrary p2 update site as input.
But with SimRel winding to a close for the December 6th release, I am swamped with other priorities.
https://github.com/eclipse-simrel
If you stay tuned here I will update you on progress and availability on this issue.
Yes, the prototype resolves requirements against capabilities to provide precise dependencies between components.
The lack of any type of standard PURL is kind of a problem and the group/name thing is just not really applicable in a space where the bundle symbolic name / version are the unique identifier.
The fact that license information can be sprinkled anywhere and everywhere in general is also a problem Even if the stuff is supposed to be in a standard place in the pom, that doesn't mean folks (maven artifacts) actually populate it properly, at all, or correctly. At Eclipse, the standard approach is for each bundle to have an about.html that is included also in the binary plugin:
Features do it somewhat differently, but also follow standard rules and conventions...
Perhaps we could simplify the license metadata by including information in the MANIFEST.MF as you suggest, but hell might freeze over before all the projects actually conform to such a new approach. I speak from experience with SimRel where it's a challenge to get projects to do anything. 😱
Thank you for your interest in this.
Note to self and others, this is related formation:
https://github.com/package-url/purl-spec/pull/272 https://github.com/eclipse/jbom
I have committed a relatively complete functional prototype to this public GitHub repo, pending any expressed interest in going further with this approach or simply reusing any parts for implementing a different approach:
https://github.com/merks/p2repo-sbom/
The repository provides an Oomph setup for creating a development environment automatically:
https://github.com/merks/p2repo-sbom/blob/main/CONTRIBUTING.md
A ci job to build the prototype product is available here:
https://ci.eclipse.org/cbi/view/p2RepoRelated/job/cbi.p2repo.sbom-build/
Product downloads are available here:
https://download.eclipse.org/cbi/updates/p2-sbom/products/nightly/latest
The following Jenkinsfile provides an example of how to use the prototype:
https://github.com/merks/p2repo-sbom/blob/main/SBOMGenerator.jenkinsfile
It's used by this job:
https://ci.eclipse.org/cbi/view/p2RepoRelated/job/cbi.p2repo.sbom-generator/
That job takes the Eclipse SDK 4.30 release as input:
https://download.eclipse.org/eclipse/updates/4.30/R-4.30-202312010110
and generates SBOMs in both xml and json format:
The license generation is a less-than-ideal hack...
I did some prototyping how one might generate an SBOM from a p2 repository.
The prototype creates a bom.xml from the Eclipse Installer self-contained repository:
https://download.eclipse.org/oomph/products/repository
It's implemented by this simple (very rough prototype) p2 application which loads the repository and analyzes the metadata and actual artifacts to produce a CycloneDX representation:
SBOMApplication.java
You can see that the bom.xml includes dependency information by wiring requirements to capabilities.
I've done some extracting of license information too, e.g., what I could find in the pom in the artifact jar, but that information is not always easy to track down in a consistent way or place. Also, I don't want to duplicate work that has already been done, e.g., by spdx or dash.
My sense is that one can produce some quite good information relatively quickly, but then one starts to slide down the slippery slope of complexity. Also many questions remain/arise about the form of the representation and how well these concepts apply to the OSGi world of bundles as well as the Equinox world of features.
The IDE WG's budget can't accommodate what could turn into weeks or months of work, and it's not entirely clear yet in which form folks want to do the analysis. I.e., there appears to be mention of doing the production of a SBOM as part of the build; implementing something that runs in Maven versus something that runs as an OSGi application is quite different technically.
So before investing more time, it would be good to discuss the strategy and plans around this...
Below is more background information and details.
I am reusing cyclonedx's core library, which have quite some dependencies:
To use this stuff in an OSGi environment, i.e., in the above application where p2 functions, one needs quite few new dependencies that could be added to Orbit which I did locally:
I also looked at SPDX's dependencies but that's a massive list and is clearly designed to be used within a maven build: