tern-tools / tern

Tern is a software composition analysis tool and Python library that generates a Software Bill of Materials for container images and Dockerfiles. The SBOM that Tern generates will give you a layer-by-layer view of what's inside your container in a variety of formats including human-readable, JSON, HTML, SPDX and more.
BSD 2-Clause "Simplified" License
967 stars 188 forks source link

Is it possible to get the source package name (and source version) in the report? #1083

Closed sameer1046 closed 2 years ago

sameer1046 commented 3 years ago

Currently tern reports the binary package info of the Debian packages. It would be great if tern can provide the source package information in the report so that it would be easy to feed the report directly on license scanning tools.

nishakm commented 3 years ago

@rnjudge what do you think?

rnjudge commented 3 years ago

I think this is technically possible for a some package managers but not all. For example, i don't think python packages have the concept of a "source" package vs binary like rpm or deb packages do. I do worry about the clutter in the default table with another column for source package. I wonder if this is a command line flag we could include? @nishakm thoughts?

nishakm commented 3 years ago

@rnjudge a long time ago, tern included some ability to retrieve corresponding sources, particularly for debian packages as it was the easiest one to implement. This is different from "source package name" though.

I agree with the cluttering of the default table. We may want to direct folks looking for more information to the JSON or HTML report format.

@sameer1046 can you provide an example of "source package name"?

rnjudge commented 3 years ago

I think what he means by source package name is in reference to rpm or deb packages where there's one source package that contains the source code which builds/produces the associated binary packages (the source package is not typically installed). For example, the systemd source package produces systemd, udev, libudev1 binaries and more (https://packages.debian.org/source/sid/systemd). These binary packages are what Tern reports. The source is important because CVEs are reported by source package name, which I think is what @sameer1046 is getting at. Is this an accurate summary, Sameer?

sameer1046 commented 3 years ago

@rnjudge Exactly. in linux/debian/rpm package have one source and from that source different binary packages were built. It would be great if it will be available in cyclonedx format

nishakm commented 3 years ago

Thanks for the explanation and clarification! I suppose this is possible for deb and rpm. I am more familiar with deb than rpm so I'll work through what is needed to implement that:

  1. Read the /etc/apt/sources.list file and files existing in /etc/apt/sources.list.d
  2. Write back to the files the URLs except modify deb to deb-src
  3. Run apt-get update
  4. Run apt-cache showsrc <package name> and parse the output to get the package name

I think it's possible to add this for the distros that support it. It requires adding a new property in Package called source_package_name and a script that does all the above steps. @rnjudge WDYT?

gernot-h commented 2 years ago

As a colleague of @sameer1046 at Siemens, I'm involved into license scanning topics since a couple of years, so some additional bits:

nishakm commented 2 years ago

I think the question we are grappling with right now is whether "source package name" should be included in our summary report. It's already becoming less summarizing ;). @sameer1046 @gernot-h if you are OK using a combination of tern's JSON format and jq, this is totally doable.

gernot-h commented 2 years ago

I think the question we are grappling with right now is whether "source package name" should be included in our summary report. It's already becoming less summarizing ;). @sameer1046 @gernot-h if you are OK using a combination of tern's JSON format and jq, this is totally doable.

For the Siemens use-case, we just need this information in any machine-readable format, so JSON sounds perfect! :)

sameer1046 commented 2 years ago

I would suggest to produce a cyclone dx bom in source package format by setting a flag in the command line. which ll list all source package in the bom in purl spec. E.g. pkg:deb/debian/gnupg2@2.2.12-1?packaging=sources This will produce a bom which will contain only source packages and not binary packages.

rnjudge commented 2 years ago

I would suggest to produce a cyclone dx bom in source package format by setting a flag in the command line. which ll list all source package in the bom in purl spec. E.g. pkg:deb/debian/gnupg2@2.2.12-1?packaging=sources This will produce a bom which will contain only source packages and not binary packages.

We might need @coderpatros's help with this one after we add source package info to the data model as he is the CycloneDX format wizard :)

Ranjit-Kumar-Nayak commented 2 years ago

@rnjudge @nishakm hey i am just new to i want to also contribute on this can you provide me some resources ?

rnjudge commented 2 years ago

Hi @Ranjit-Kumar-Nayak -- thanks for your interest! If you have good ways/can come up with a way to list source packages installed on a system using dpkg or rpm package managers (or in a bash script) that would be a good starting point! Bonus points if you can find the source given it's binary package name.

rnjudge commented 2 years ago

@sameer1046 @gernot-h may I ask which vulnerability scanner you are using that requires sources?

gernot-h commented 2 years ago

Thanks for the explanation and clarification! I suppose this is possible for deb and rpm. I am more familiar with deb than rpm so I'll work through what is needed to implement that:

1. Read the `/etc/apt/sources.list` file and files existing in `/etc/apt/sources.list.d`

2. Write back to the files the URLs except modify `deb` to `deb-src`

3. Run `apt-get update`

4. Run `apt-cache showsrc <package name>` and parse the output to get the package name

Sorry, @nishakm, for coming back to this so late. I'm not exactly sure what the environment is here (I don't know how tern works internally), but in case you are running on the system with the packages in question, you don't need additional apt sources. All you need is already known by dpkg once your package is installed, a simple

dpkg-query -f '${source:Package} ${source:Version} -W <pkg>

will show you the source information for an installed package. The procedure you described might however make perfect sense if you want to run it in a distinct environment where the package you analyze is not installed.

By the way, there's an important thing to note: the source version might also differ from the binary version in rare cases. So you should also query for it and add it to the BOM.

gernot-h commented 2 years ago

@sameer1046 @gernot-h may I ask which vulnerability scanner you are using that requires sources?

Sure. :) This is not about security scanning, but about license clearing (legal compliance task...). We use (and maintain ;) ) https://github.com/fossology/ for this.

rnjudge commented 2 years ago
dpkg-query -f '${source:Package} ${source:Version} -W <pkg>

Thank you!! This is super helpful. Do you have the command to do this using RPM handy?

By the way, there's an important thing to note: the source version might also differ from the binary version in rare cases. So you should also query for it and add it to the BOM. Noted.

gernot-h commented 2 years ago

@sameer1046, could you please update the issue title and initial description to also include "source version", so for example "get the source package name (and source version) in the report". I put this in brackets as I'm unsure if this is relevant for other distributions than Debian and Ubuntu, though.

gernot-h commented 2 years ago

Thank you!! This is super helpful. Do you have the command to do this using RPM handy?

This should be rpm -q --qf '%{SOURCERPM}' <pkg>. The SOURCERPM also contains source version number.

And, on my OpenSUSE system, source version can also differ from binary version in rare cases, so this seems to be a common concept:

> rpm -q --qf "%{NAME} %{VERSION}-%{RELEASE} %{SOURCERPM}\n" cron
cron 4.2-70.14.4.1 cronie-1.5.1-70.14.4.1.src.rpm
rnjudge commented 2 years ago

Thanks again @gernot-h. I think we can get this feature merged and data available in the JSON report before our next release (planned for next week).