rpm-software-management / librepo

A library providing C and Python (libcURL like) API for downloading packages and linux repository metadata in rpm-md format
http://rpm-software-management.github.io/librepo/
GNU Lesser General Public License v2.1
74 stars 91 forks source link

lr_checksum_fd_compare() does not return cached calculated value #233

Closed malmond77 closed 3 years ago

malmond77 commented 3 years ago

Context: I'm working on https://fedoraproject.org/wiki/Changes/RPMCoW

I've found that DNF still uses some old yum code to produce and validate checksums of downloaded rpms (https://github.com/rpm-software-management/dnf/blob/master/dnf/yum/misc.py#L153). In the normal case: librepo downloads and validates rpm files using librepo. In the less common path, the rpm is already there, and dnf uses the python code linked to verify the checksum.

This works today because the implementation is close enough to not matter.

Enter: #222 . This augments checksum handling with specific comprehension of "transcoded" rpms. This works in the normal path, but fails with existing packages in cache because the python code is used, and it doesn't know about the transcoded .recorded checksum.

Solving this is fairly involved. There's changes to librepo, libdnf, and dnf needed to fix this. As listed: librepo is first, and I'd like to fix a latent bug in the API, or get guidance on how to better do it.

My thought is to make dnf use lr_checksum_fd_compare() and lr_checksum_fd_cmp() instead ofyum.misc.checksum(). Why both? Because in one case, we need to validate checksums (https://github.com/rpm-software-management/dnf/blob/master/dnf/package.py#L333) and another we need to just calculate it (https://github.com/rpm-software-management/dnf/blob/master/dnf/package.py#L59)

The API for librepo doesn't explicitly support calculating checksums. The closest we get is is to call lr_checksum_fd_compare() and pass an empty string for expected, ignore matches, and pass a char * for calculated.

If you call lr_checksum_fd_compare() with caching = TRUE, and an xattr is present, then matches is set, but calculated is not set. This feels like a bug: if the caching mechanic is robust enough to use for the normal case, then it should be possible to set *calculated too. Using caching shouldn't come with this drawback.

For now, I'm going to call lr_checksum_fd_compare() in my libdnf SWIG wrapper with caching = FALSE to force a full checksum though librepo. I will reference this issue in my code, and one day, when the fix to this issue is available in downstream packages, I can enable caching for this less common path.

malmond77 commented 3 years ago

Fix is merged. I will update https://github.com/rpm-software-management/libdnf/pull/1164 to explicitly depend on v1.13.1 which has the fix so I can reliably pass TRUE for caching in checksum_value(). I will also bump versions and depend on them in dnf itself, completing the chain.