nexB / purldb

Tools to create and expose a database of purls (Package URLs). This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ and nexB for https://www.aboutcode.org/ Chat is at https://gitter.im/aboutcode-org/discuss
https://purldb.readthedocs.io/
29 stars 20 forks source link

Add debian ".udeb" support #345

Open AyanSinhaMahapatra opened 4 months ago

AyanSinhaMahapatra commented 4 months ago

From @armijnhemel and @pombredanne : https://github.com/nexB/purldb/pull/300#discussion_r1505832133

This is not always correct. There are some packages that have the extension .udeb, instead of .deb. An example is libzstd, which includes both .deb and .udeb versions.

For an explanation of .udeb: https://wiki.debian.org/udeb

An udeb is a stripped down deb file for use by the DebianInstaller . It removes the naughty bits (the documentation, as well as some checksums, etc.) to save space.

  1. Are there cases where there is only a .udeb file and no .deb files for a package?

Yes, there are. A few examples I could find (both from Ubuntu):

https://ftp.nluug.nl/pub/os/Linux/distr/ubuntu/pool/main/d/debian-installer-utils/ https://ftp.nluug.nl/pub/os/Linux/distr/ubuntu/pool/main/u/udpkg/

  1. If both .udeb and .deb files are present, is it not better to return the .deb file as the main binary archive?

I am not sure. Both are an instance derived from the same source code, so both would likely be valid matches. As these packages are strongly tied to the installer and unlikely present on a running system like a container, I could imagine that the .deb is probably a better match almost always, but perhaps not always. Also, the packages seem to have different names according to the Purl syntax. Let's look at:

https://ftp.nluug.nl/pub/os/Linux/distr/ubuntu/pool/main/b/busybox/

There are a few packages:

busybox
busybox-static
busybox-udeb

These would all get different purls, but would have been derived from the same source code (also see https://github.com/nexB/purldb/issues/308 which is somehwat related to this).

  1. Currently I was only using packages detected in the latest debian docker images at: docker://debian:bullseye and docker://ubuntu:devel as test cases for the purls (note that sometimes older versions are yanked from debian, need to support these from the archives too), anything else that would be nice to test?

I am not sure if all the old versions can even be found in the archives. I cannot think of anything else to test.

AFAIK, you cannot be both a deb and a udeb at once. There is a field in the control file that drives this: Package-Type

A udeb package and a deb can be built possibly built from the same source like for busybox, but are different packages:

https://ftp.nluug.nl/pub/os/Linux/distr/ubuntu/pool/main/b/busybox/busybox_1.21.0-1ubuntu1.4.dsc is for the source
busybox-udeb_1.21.0-1ubuntu1.4_amd64.udeb is the one built as udeb
the control file in https://ftp.nluug.nl/pub/os/Linux/distr/ubuntu/pool/main/b/busybox/busybox_1.21.0-1ubuntu1.4.debian.tar.gz has 👍

Package: busybox-udeb Package-Type: udeb

We could use a "package_type" as a qualifier. We should track the implementation and support for udeb in a separate issue IMHO.

I am not sure if all the old versions can even be found in the archives. I cannot think of anything else to test.

They are in https://snapshot.debian.org/ FWIW

Note also that it would be nicer to get only the metadata URL and get some archive names from there if these are correct, (and also get if this is a .deb or a .udeb), we could similarly improve getting other URLs by reducing network calls to debian possibly.