rpm-software-management / rpm

The RPM package manager
http://rpm.org
Other
508 stars 367 forks source link

RFE: read sources checksums from the SPEC file and verify them #463

Open gsauthof opened 6 years ago

gsauthof commented 6 years ago

In SPEC files there is the Source0/SourceX directive to specify the sources. Example:

Source0: http://releases.nixos.org/patchelf/patchelf-%{version}//%{name}-%{version}.tar.bz2

cf. https://github.com/gsauthof/copr-epel/blob/master/patchelf/patchelf.spec#L9

Other software then uses those keys to download the referenced files. For example, one can tell rpmbuild to auto-download all sources, Copr auto-downloads the sources (by default) or the user evaluates the SPEC file with rpmspec -P and copy'n'pastes into a curl/wget command line.

Either way, as-is, there is no way to specify cryptographic checksum inside the SPEC file for the sources specified there. For example like this:

Source0-sha256: a0f65c1ba148890e9f2f7823f4bedf7ecad5417772f64f994004f59a39014f83

With such syntax available it would be great if rpm would verify each source against the specified checksum and fail if there is a mismatch.

cf. https://bugzilla.redhat.com/show_bug.cgi?id=1536846#c7 for an example where this came up

As-is, it seems, one has to maintain checksums outside the .spec file, be cautious to manually check checksums or just ignore the problem - which is all tedious and error-prone or dangerous.

For example, because of this I had to roll my own verification logic for my Copr packages (as described in https://bugzilla.redhat.com/show_bug.cgi?id=1536846#c7).

Thus, adding this capability directly to rpm would reliably protect against man-in-the-middle (MITM) attacks.

Note that restricting auto-downloads to secure protocols like https isn't a substitute for this feature as a) not all sources are available via https and b) the infrastructure behind the https endpoint can have security issues (e.g. a temporary intrusion where files are getting replaced).

See also the discussion around https://pagure.io/copr/copr/pull-request/211#comment-43965 how Copr disabled insecure protocols for auto-downloads.

Other package formats support specifying checksums for the referenced source files (e.g. Arch/Gentoo/Homebrew source packages).

gsauthof commented 6 years ago

I'm perfectly fine with the rpm5 syntax and probably with most possible alternatives - as long as they are usable.

The rpm5 description I could find in the changelog mentions:

digest([<algorithm>:]/path) = hex

I verify checksums all the time. The vast majority of open-source upstream also cares as you can see how much effort is put into publishing checksums for releases (e.g. published via independent channels, checksum files/release notes with checksums signed with a GPG key that is trusted by well-known project members etc.). I only have to deal with software with missing checksums when I have to get software from clueless corporations like Oracle (e.g. the download is only provided over http and often there is no checksum available at all, or just a md5 one or even just a CRC one).

I would argue that updating a checksum to the SPEC file is less tedious as the status-quo where you have to implement some outside checksum verification process/scripting. Also, with a new upstream release you already have to edit the spec file to adjust the version information.

Regarding 'SRPMs in rpm' - the SRPM isn't always the starting point. For example, for the quite popular Copr service it is not.

I can't see how adding checksums to a source downloads cache in a service like Copr would improve security. Because then the file is just downloaded (and checksummed) on the first request. And the source file could be very well comprised on that first download. Depending on the cache size and rebuild-frequency there are no more re-downloads such that such a compromise wouldn't be detected.

berolinux commented 4 years ago

Would be really useful to have (of course should be optional to keep old spec files working).

Conan-Kudo commented 4 years ago

In Fedora, we're considering having sources auto-fetched and uploaded to the lookaside cache as part of accepting PRs for package updates, but to make this straightforward, we'd want this to be in the spec file so that we know we're downloading what we're supposed to.

Something along the lines of SourceN(<checksum-type>): <checksum-hash> (and similar for PatchN, of course) was what I'm thinking of.

There's some prior art here in other package management solutions that influenced this idea. For example, Arch and Alpine both let you specify a source URL and specify checksums to validate this, as does Solus' ypkg YAML format.

For example, the Arch PKGBUILD for rpm-tools has the following snippet:

source=(http://ftp.rpm.org/releases/rpm-$_base_pkgver/rpm-$pkgver.tar.bz2
    rpmextract.sh
        rpmlib-filesystem-check.patch)
sha256sums=('ddef45f9601cd12042edfc9b6e37efcca32814e1e0f4bb8682d08144a3e2d230'
            '3e5bf450d4628366ba35469ec0530a99cd09ab2616a3d261a3f68270f481f777'
            'bd0e6dbd458f990268c60324190c6825b234647ecdde08296d2b453dc4bce27a')

And Solus' ypkg YAML for rpm does something similar:

source     :
- https://github.com/rpm-software-management/rpm/archive/rpm-4.14.2.1-release.tar.gz : 92cab9da7524cf4e4abf33f160cdfb9e63fe021bc7133f97735f70cdec777400
voxik commented 4 years ago

I know I had my own #570 where I already proposed 3 possibilities:

Source0: ftp://ftp.example.com/pub/foo/%{name}-%{version}.tar.gz
Checksum0: SHA512  = 2c8211ae5f1578502dc9b29babe7d03ec61f500b3c2dd309be2bbd34fd194abba29d95812e7dab4bfacda13e342323921663464bab4cbf4af0a198e8437233f4

Source0: SHA512 (ftp://ftp.example.com/pub/foo/%{name}-%{version}.tar.gz) = 31bacf58469953282cd5d8b51862dcf4b84dedb927c1871bc3fca32fc157fe49187631575a70838705fe246f4555647577a7ecc26894445a7d64de5503dc11b4

Provides: checksum(SHA512 (%{name}-%{version}.tar.gz) = 31bacf58469953282cd5d8b51862dcf4b

but now thinking about this again, what is the problem with the sources file in dist-git? It already contains checksums.

Conan-Kudo commented 4 years ago

but now thinking about this again, what is the problem with the sources file in dist-git? It already contains checksums.

We cannot rely on this file if we want rpm to be able to auto-download sources with any degree of confidence.

Per the comment in the macros.in file:

#
# Should rpm try to download missing sources at build-time?
# Enabling this is dangerous as long as rpm has no means to validate
# the integrity of the download with a digest or signature.
%_disable_source_fetch 1

This was the rationale for my filing #1126...

praiskup commented 4 years ago

We cannot rely on this file if we want rpm to be able to auto-download sources with any degree of confidence.

Why? Technically there shouldn't be a problem.

nphilipp commented 4 years ago

@praiskup I think what @Conan-Kudo meant is (correct me if I'm wrong), that if we have the hashes only in sources and not in the spec file, a plain local rpmbuild can't (safely) download missing sources.

Conan-Kudo commented 4 years ago

@praiskup I think what @Conan-Kudo meant is (correct me if I'm wrong), that if we have the hashes only in sources and not in the spec file, a plain local rpmbuild can't (safely) download missing sources.

This is correct.

clime commented 4 years ago

Currently, fedpkg downloading and uploading functionality is independent of rpm which brings an advantage that we can use it to upload/download custom binary stuff for other "package" types (e.g. containers).

So placing the source hashes into spec file itself would limit the lookaside-cache functionality to rpms only.

There is another factor that currently when I call fedpkg sources I know I am safe because it doesn't parse spec file.

If we put the hashes into spec file, it would mean fedpkg sources would suddenly become a different beast with different security requirements (which should be considered when doing this automatically on production systems).

TomaszGasior commented 4 years ago

For me, causal user of OS sometimes building small packages, current way of building rpms is very confusing. There are various tools which duplicate functionality: rpmbuild, mock, rpkg, fedpkg etc. It's much harder to understand than PKGBUILDs on Arch Linux for example. For me having ability to declare checksums in rpmspec in standardized way would make UX much better.

This is the reason of huge success of AUR and PKGBUILDs in Arch Linux world — simplicity.

clime commented 4 years ago

For me, causal user of OS sometimes building small packages, current way of building rpms is very confusing. There are various tools which duplicate functionality: rpmbuild, mock, rpkg, fedpkg etc. It's much harder to understand than PKGBUILDs on Arch Linux for example. For me having ability to declare checksums in rpmspec in standardized way would make UX much better.

I am not sure what it would actually bring. It is good that you can make %_disable_source_fetch 0 trustworthy but one could also do e.g.

Source0: https://github.com/release-engineering/dist-git/archive/dist-git-1.12-1.tar.gz#sha256

with the current rpm syntax and then have a macro that verifies the tarball checksum based on the #sha256. Is it too hacky? :)

It might not even be a hack because https://tools.ietf.org/html/rfc2396#section-4.1 - the fragment identifier is in the realm of client.

So imho, new rpm syntax is not even needed.

I think we could extend rpm macros to support this and it would be nice e.g. for COPR.

It's a pity that url fragment already seems to be used for something: https://fedoraproject.org/wiki/Packaging:SourceURL#Troublesome_URLs

but that maybe it doesn't necessarily mean we couldn't extend its use...? Btw. there is a note:

"Sometimes this does not work because the upstream cgi tries to parse the fragment "

that would be a server-side error according to the RFC.

But for Fedora this has still a limited use because we would like to be independent of any of github/gitlab/bitbucket stuff.

We could use that information (the hash) to automatically fetch the sources from upstream the first time when they are not yet present in our (Fedora) lookaside cache and then further rely on the sources file, the checksum there and the lookaside cache but I think this is a problematic solution. It would mean build system would need to commit into dist-git, which would mess up automatic versioning (if there is any). I haven't thought about this too much, there might be other issues.

I think the fedpkg upload way is quite fine and it can be automated too (outside of Fedora infra) if one trusts 100% to the tarball source/provider. It's hard to tell if such automation should be recommended.

This is the reason of huge success of AUR and PKGBUILDs in Arch Linux world — simplicity.

clime commented 4 years ago

...Actually the fragment part shouldn't even be sent to the server so that note should probably be fixed...

berolinux commented 4 years ago

One problem with the sources file is that it is distro specific -- Fedora uses sources, OpenMandriva uses a similar file (though with slightly different syntax) called .abf.yml, probably other distributions have yet other workarounds.

Another problem is that it's not the spec file -- I don't think we want to end up with a mess similar to what dpkg has in those debian directories.

clime commented 4 years ago

One problem with the sources file is that it is distro specific -- Fedora uses sources, OpenMandriva uses a similar file (though with slightly different syntax) called .abf.yml, probably other distributions have yet other workarounds.

Another problem is that it's not the spec file -- I don't think we want to end up with a mess similar to what dpkg has in those debian directories.

I think it is distro-specific because it is tight down to particular distribution-git that some higher-level tools work with (fedpkg, centpkg, rpkg, ...), i.e. where the packages are stored.

But the SourceX: in rpm spec file typically does not reference a tarball in that particular dist-git but it instead references sources from upstream (at least in Fedora/CentOS and for OpenMandriva that seems to be true as well). But that's not from where the tarballs are downloaded when they are going to be built.

clime commented 4 years ago

Anyway, what about something like %_verify_fetched_source_checksums macro with values 0/1/2 where

0: do not check source checksums even if present
1: check source checksums if present as `#(<hashtype>)<checksum>` url suffix by invoking `<hashtype>sum` command from coreutils to do the check
2: check source checksums and return false when some fetched source does not have a checksum attached

I imagine the verification would only apply to files that were fetched by rpmbuild. Those that were already present before the build started wouldn't be checked with assumption that user gave to rpmbuild valid sources. What do you think?

voxik commented 4 years ago

Thinking further about this, do we actually need something really fancy as special tag?

For example, it is quite easy to check if the checksums in the dist-git sources file are correct during rpmbuild -bs. It is enough to put %(sha512sum -c sources) somewhere into specfile preamble. If the checksums are not correct then the rpmbuild -bs fails.

Looking at Verifying Signatures Fedora guidelines, I actually wonder, why the check is done in %prep. Maybe something like this macro should be placed right bellow the sources in preamble.

voxik commented 4 years ago

I actually wonder, why the check is done in %prep.

This might be due to BR: gnupg2, have not investigated this.

Conan-Kudo commented 4 years ago

@voxik That is for verifying GPG signatures if they exist, and yes, it requires gnupg2 as a build dependency in that scenario.

voxik commented 4 years ago

@voxik That is for verifying GPG signatures if they exist, and yes, it requires gnupg2 as a build dependency in that scenario.

But if that was macro included in RPM, RPM would probably grow the dependency, so it would not need BR. So I still think this could be way forward.

clime commented 4 years ago

Hello @voxik, sha256sum etc. are in coreutils, which I bet rpm already requires...i mean coreutils should be present on any system anyway.

An interesting idea with %(sha512sum -c sources) but I wouldn't bring the sources file into the picture because it is used to fetch files from dist-git before rpmbuild even happens and checksums are checked at that stage. All urls that are now pointing to upstream would need to change to point to dist-git lookaside cache if the rpm mechanism for downloading should be used instead of the fedpkg one.

We could use a bit of bash code %([ "$(sha256sum <path_to_source_filename> | cut -d " " -f 1)" = <checksum> ]) to do the verification per downloaded source but i think <path_to_source_filename> might be slightly tricky unless rpm exposes enviroment variable like 'SOURCES'. Also maybe it would be more pleasant to have the support for this in rpm than to put those snippets into spec.

Conan-Kudo commented 4 years ago

RPM does not specifically require GNU coreutils. And sha256sum and friends are not specifically available on all platforms RPM is used on.

clime commented 4 years ago

Hmm, on Fedora:

$ dnf repoquery --requires rpm
Last metadata expiration check: 0:00:08 ago on Tue 24 Mar 2020 01:34:47 PM CET.
/usr/bin/bash
/usr/bin/db_stat
/usr/bin/sh
coreutils
curl
libarchive.so.13()(64bit)
libc.so.6(GLIBC_2.4)(64bit)
libpopt.so.0()(64bit)
libpopt.so.0(LIBPOPT_0)(64bit)
libpthread.so.0()(64bit)
libpthread.so.0(GLIBC_2.2.5)(64bit)
librpm.so.9()(64bit)
librpmio.so.9()(64bit)
popt(x86-64) >= 1.10.2.1
rtld(GNU_HASH)

I can see coreutils there. So I guess you mean on other distributions...if you can name those and where sha512sum and similar checksum checking utils can be found, that would be great!

Generally speaking....Is adding dependency to rpm for checksum checking utilities a big deal? It seems to me that it is something that rpm could support.

Conan-Kudo commented 4 years ago

On macOS, there is not a consistent interface for doing checksums via CLI. I'm unsure if AIX and other platforms would also have similar problems.

voxik commented 4 years ago

We could use a bit of bash code %([ "$(sha256sum <path_to_source_filename> | cut -d " " -f 1)" = <checksum> ]) to do the verification per downloaded source

Yep, something like this is where I was heading to.

but i think <path_to_source_filename> might be slightly tricky unless rpm exposes enviroment variable like 'SOURCES'

RPM exposes %{SOURCE1} right after it appears in the specfile.

All urls that are now pointing to upstream would need to change to point to dist-git lookaside cache if the rpm mechanism for downloading should be used instead of the fedpkg one.

No, the %{SOURCE1} points to the rpmbuild directory, not to the URL. I don't think that fetching sources by RPM is really in question here, but it could be used.

clime commented 4 years ago

We could use a bit of bash code %([ "$(sha256sum <path_to_source_filename> | cut -d " " -f 1)" = <checksum> ]) to do the verification per downloaded source

Yep, something like this is where I was heading to.

but i think <path_to_source_filename> might be slightly tricky unless rpm exposes enviroment variable like 'SOURCES'

RPM exposes %{SOURCE1} right after it appears in the specfile.

All urls that are now pointing to upstream would need to change to point to dist-git lookaside cache if the rpm mechanism for downloading should be used instead of the fedpkg one.

No, the %{SOURCE1} points to the rpmbuild directory, not to the URL. I don't think that fetching sources by RPM is really in question here, but it could be used.

Alright but I think we are maybe tracking different goals then :). I thought the issue was about making %_disable_source_fetch 0 reliable. It's what the original post suggested to me.

Conan-Kudo commented 4 years ago

Alright but I think we are maybe tracking different goals then :). I thought the issue was about making %_disable_source_fetch 0 reliable. It's what the original post suggested to me.

It is about that. We don't want to operate on sources unless they've been validated to be good. That means no doing %prep, %generate_buildrequires, etc. until validated if rpm is downloading the sources.

clime commented 4 years ago

Cool, ok, I admit I am not sure how big deal is the macOS compatibility. I think if somebody complains that the feature doesn't work there, it would be a reason to instead use some c library for the checksum validation. Maybe that could be done immediately? I would still consider relying on coreutils a good start.

clime commented 4 years ago

But, yeah, maybe you guys have some other/better solution in mind. Idk.

gsauthof commented 4 years ago

In reply to https://github.com/rpm-software-management/rpm/issues/463#issuecomment-601922444:

Source0: https://github.com/release-engineering/dist-git/archive/dist-git-1.12-1.tar.gz#sha256

with the current rpm syntax and then have a macro that verifies the tarball checksum based on the #sha256. Is it too hacky? :)

Yes, this looks ultra hacky to me.

I can see a lot of potential for confusion with this #hashsuffix syntax.

You can't expect that a package contributor would have that part of that RFC present which is necessary to understand the mechanism. And even then, it's far from obvious or similar to any other existing mechanism for declaring checksums of dowloadable files.

clime commented 4 years ago

Well, on the other hand, it is a minimalistic solution that doesn't introduce any new rpm constructs. Also, you don't need to know or read the RFC. You just need to read the rpm documentation where it is described that source checksums are attached to source URLs as #(<checksum-type>)<checksum> suffixes. With an example, I think, people would get it immediately.

nphilipp commented 4 years ago

Unfortunately, the minimalistic solution is used for something else already: URL fragments are used to have a different file name from what a download from the URL would normally yield, e.g. if an on-the-fly generated upstream archive doesn't contain the package name.

praiskup commented 3 years ago

Could RPM hook in a check right before executing %prep section if e.g. macro like %global source1_chksm sha512_<sum> was defined? Older RPM implementations would just ignore such macro.

praiskup commented 3 years ago

Lemme know if you think that some PoC macro in /usr/lib/rpm/macros.d doing exactly this would be useful (as first %prep instruction).

Conan-Kudo commented 3 years ago

Implementing it that way would require making changes to %prep initialization in RPM, so that it would run before anything is executed...

praiskup commented 3 years ago

Yes, that's what I meant. Some implicit hook in %prep implementation, or before.

socketpair commented 3 years ago

I want rpmlint to fight against packagers who unintentionally put wrong archives. Yes, I trust github, so I want rpmlint to download from github, i.e.

  1. unpack .srpm
  2. calc checksums of archives from .srpm that are expected to be from github
  3. run something like rpmbuild --undefine=_disable_source_fetch --define '_sourcedir XXX' -ba *.spec
  4. compare sha256sum($file_in_srpm) and sha256sum($downloaded_file)

Github is trusted because of in git (i.e. block-chain) it's impossible to have different content under the same hash-tag.

I don't see any real cases where specifying sha256sum inside spec-file could help. Just suppose, stupid developer has got .tar.gz from wrong branch or so, calculated it and put to the .spec file. Comparing checksums will show everything is OK, but actually data does not correspond to URL. I think, silent data corruption is not a case we should care of.

Conan-Kudo commented 3 years ago

Github is trusted because of in git (i.e. block-chain) it's impossible to have different content under the same hash-tag.

That's not actually true. Git uses SHA1 and collisions have been done to Git with it. That also said, GitHub archives may or may not be reproducible. And Git refs are not guaranteed to be stable either, as people can rewrite or modify them without any notice. The checksums make it so such changes are detected.

socketpair commented 3 years ago

OKAY. so rpmlint should check three checksums match:

  1. downloaded file from github
  2. attached archive in srpm
  3. sha-256 sum in .spec file
dmach commented 1 year ago

@dmnks @ffesti @pmatilai Could you consider adding this to your backlog(s)? There seem to be a demand for this feature in multiple distros (see all the comments above).

dirkmueller commented 1 year ago

I like the idea and I came up with a similar syntax some time ago (just in my head):


Source0:      https://host/path/to/tarball.tar.gz
Source0(sha512):   <sha512sum>
Source0(sha1): <sha1>

if any of those are set, rpm should (always, optionally?) check the checksum before starting with %prep. potentially it could also be simply implemented inside %setup/%autosetup, that might be sufficient.

if the general idea is liked, I might give it a try to implement that.

Conan-Kudo commented 1 year ago

I like that and would love to see an implementation for it.

voxik commented 1 year ago

Source0:      https://host/path/to/tarball.tar.gz
Source0(sha512):   <sha512sum>
Source0(sha1): <sha1>

I might be wrong, but I thought that Source without any number is the modern/future/preferred way of specifying sources. Therefore I am not sure how Source(sha512) would play together with that syntax.

Conan-Kudo commented 1 year ago

It should be compatible with that too, since it's effectively Tag(qualifier) format.

voxik commented 1 year ago

It should be compatible with that too, since it's effectively Tag(qualifier) format.

But how you match them? By order?

Conan-Kudo commented 1 year ago

I think so, yes.

szpak commented 1 year ago

It would be useful. When an OpenPGP upstream signature is not available and one wants to have the checksum verification, it needs to be performed manually in the %prep section (similarly to the things people do in Dockerfiles).

dirkmueller commented 1 year ago

Unfortunately the suggested format of Source(sha256): format is not backward compatible with older rpm releases, and having the checksum as an extra tag (with autonumbering) and if conditions could be error prone and tricky. so @mlschroe came up with an alternative proposal:

Source sha256(<checksum>):         https://files.pythonhosted.org/packages/source/.../%{name}-%{version}.tar.gz
Source42   sha256(<checksum>)  :         https://files.pythonhosted.org/packages/source/.../%{name}-%{version}.tar.gz

This works with old rpms and can be parsed easily with a patch (working on it at the moment). The only downside I see is that with sha256 the source lines get relatively long (at least 80 characters), but I personally can live with that..

An alternative syntax that builds upon another exploitable trick we use in SUSE spec files for a while already is this:

Source:  https://files.pythonhosted.org/packages/source/.../%{name}-%{version}.tar.gz#sha256:<checksum>/%{name}-%{version}.tar.gz

This works as well because rpm parses only after the last '/', and the download code is ignoring the fragment part.

any opinions on which way to go?

Conan-Kudo commented 1 year ago

Unfortunately the suggested format of Source(sha256): format is not backward compatible with older rpm releases, and having the checksum as an extra tag (with autonumbering) and if conditions could be error prone and tricky.

Is backward compatibility really an issue if we're talking about a new feature? This wasn't an okay reason when we added weak dependencies (which also broke on old rpm too).

Adding a new tag to do anything is always going to cause this problem, and if we're not willing to own up to that and bite that bullet, then we can't add any new tags ever. 😦

The alternative proposal relies on RPM having broken syntax parsing, because I don't see a reason that it shouldn't choke on Source sha256(<checksum>): URL. And the "alternative syntax" breaks the ability to use SourceURL downloading (which most distributions actually do rely on).

kloczek commented 1 year ago

I would like to see something more like pk11 format

SourceCSum: sha256://<checksum>

Which would allow use different checksums algorithms. Or as second argument of the Source:

Source: <url/><filename> sha256://<checksum>
dirkmueller commented 1 year ago

Unfortunately the suggested format of Source(sha256): format is not backward compatible with older rpm releases, and having the checksum as an extra tag (with autonumbering) and if conditions could be error prone and tricky. Is backward compatibility really an issue if we're talking about a new feature?

Yes, because it breaks parsing of spec files with older releases, and thats a common/supported scenario in openSUSE (where packages are maintained in tumbleweed, but are also built for older releases). it would be really nasty having to %if that out all the time.

This wasn't an okay reason when we added weak dependencies (which also broke on old rpm too).

Sure, but that was a much more fundamental change, breaking it there is a desirable behavior.

The alternative proposal relies on RPM having broken syntax parsing, because I don't see a reason that it shouldn't choke on Source sha256(<checksum>): URL.

Well, it doesn't, and I don't think we'll change the behavior of old releases. and new releases can parse this then with a patch.

And the "alternative syntax" breaks the ability to use SourceURL downloading (which most distributions actually do rely on).

no, absolutely not. it works just fine (except for in the open build service, but that has been fixed today).