Open daviddavis opened 2 years ago
Here's a copy of the primary.xml:
<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns="http://linux.duke.edu/metadata/common" xmlns:rpm="http://linux.duke.edu/metadata/rpm" packages="2">
<package type="rpm">
<name>bear</name>
<arch>noarch</arch>
<version epoch="0" ver="4.1" rel="1"/>
<checksum type="sha256" pkgid="YES">ceb0f0bb58be244393cc565e8ee5ef0ad36884d8ba8eec74542ff47d299a34c1</checksum>
<summary>A dummy package of bear</summary>
<description>A dummy package of bear</description>
<packager></packager>
<url>http://tstrachota.fedorapeople.org</url>
<time file="1659114662" build="1331831374"/>
<size package="1846" installed="42" archive="296"/>
<location href="Packages/b/bear-4.1-1.noarch.rpm"/>
<format>
<rpm:license>GPLv2</rpm:license>
<rpm:vendor></rpm:vendor>
<rpm:group>Internet/Applications</rpm:group>
<rpm:buildhost>smqe-ws15</rpm:buildhost>
<rpm:sourcerpm>bear-4.1-1.src.rpm</rpm:sourcerpm>
<rpm:header-range start="280" end="1697"/>
<rpm:provides>
<rpm:entry name="bear" flags="EQ" epoch="0" ver="4.1" rel="1"/>
</rpm:provides>
</format>
</package>
<package type="rpm">
<name>camel</name>
<arch>noarch</arch>
<version epoch="0" ver="0.1" rel="1"/>
<checksum type="sha256" pkgid="YES">c5c34e1843847990d2c79b936309d6257279e26f907e20f9f58073a14525e1ef</checksum>
<summary>A dummy package of camel</summary>
<description>A dummy package of camel</description>
<packager></packager>
<url>http://tstrachota.fedorapeople.org</url>
<time file="1659114663" build="1331831360"/>
<size package="1847" installed="42" archive="296"/>
<location href="Packages/c/camel-0.1-1.noarch.rpm"/>
<format>
<rpm:license>GPLv2</rpm:license>
<rpm:vendor></rpm:vendor>
<rpm:group>Internet/Applications</rpm:group>
<rpm:buildhost>smqe-ws15</rpm:buildhost>
<rpm:sourcerpm>camel-0.1-1.src.rpm</rpm:sourcerpm>
<rpm:header-range start="280" end="1697"/>
<rpm:provides>
<rpm:entry name="camel" flags="EQ" epoch="0" ver="0.1" rel="1"/>
</rpm:provides>
</format>
</package>
</metadata>
FYI, it sounds like we'll not be allowing users to set relative_path on upload so the package should be named using nevra and thus, this isn't going to be a high priority for us.
K so I'm finally looking at this one. Here's the state of things.
Pulp has one repository-level uniqueness filter (not constraint in the database sense, it's enforced by logic). That filter as it currently exists is N + E + V + R + A + "location_href".
location_href is filled in with NEVRA if you don't manually override it using the "relative_path" parameter. But you can set that parameter to anything, which is not great, especially because it isn't part of the global package unique constraint (so whatever value you use during an upload will be used everywhere).
Scenarios where it behaves CORRECTLY:
If you add two packages with the same NEVRA and different checksums (one signed one unsigned) to one repository without setting the "relative_path", then the repo-unique-filter will match, and the repo will end up with only one, as the newer one will replace the older one.
If you add two packages with the same NEVRA and different checksums to one repository while using the same "relative_path" for each
Scenarios where it behaves KIND OF CORRECTLY:
bear-1.0-1.noarch.rpm
when uploading the package camel-1.0-1.noarch.rpm
, the package will be published under the name bear-1.0-1.noarch.rpm
Scenarios where it behaves INCORRECTLY:
bear-1.0-1.noarch.rpm
exists in the same repository, then the two will silently conflict and you might have issues w/ DNF. relative_path
with a value that didn't exactly match the one expected based on the NEVRAThe general TL;DR is that using the "relative_path" parameter on the upload endpoint generally leads to suboptimal outcomes.
How to Fix:
relative_path
param from our API entirely or make it no-op. Technically against SEMVER, but if we consider it broken, there's wiggle room.We could do one or both but either is painful and not a complete fix. Fixing data that already exists is a can of worms I'm going to discuss on the BZ not here. That's a separate conversation.
Neither of these feel like great options? But I would probably prefer the former in the short-term.
We don't allow regular users to set the relative_path for packages but I think we want to allow admins to do so--let me double check on this. Would having a config variable that defaults to False (e.g. ALLOW_UNSAFE_SETTING_OF_RPM_NAMES
) be an option?
Given Daniels comment and analyses, I agree that providing relative_path
on the upload creates more problems than benefits, where the benefits are still unclear.
I would be in favor of removing in the upload api the relative_path
or making it no-op. We'd also then attempt to fix already existing data by properly setting the package.location_href
and content_artifact.relative_path
stored in DB.
@daviddavis everything that has in it's name unsafe
implies no guarantee and ask for trouble for us to maintain somehow the corner case as for the user who will get non-deterministic results. Unless there is a justifiable strong usecase to keep relative_path
on the upload I'd remove it/make it no-op.
I checked and we don't the ability to set filenames when uploading rpms.
We'd also then attempt to fix already existing data by properly setting the package.location_href and content_artifact.relative_path stored in DB.
This makes me a bit nervous. Would it be possible to let users opt and/or have a dry run option that would list packages that have bad location_hrefs/relative_paths?
@daviddavis That probably won't be a problem for the django management command
Version pulpcore 3.21.0.dev pulp-rpm 3.18.0.dev
Describe the bug If you add two packages with the same filename to a repo, then only one gets published/distributed (good). However, both show up in primary.xml (not so good) and the checksum can potentially not match (bad).
To Reproduce This is a bit of a contrived example but it demonstrates the problem. Basically you add two packages with the same filename to a repo and publish it.
API Commands
wget https://fixtures.pulpproject.org/rpm-unsigned/bear-4.1-1.noarch.rpm wget https://fixtures.pulpproject.org/rpm-unsigned/camel-0.1-1.noarch.rpm http --form :24817/pulp/api/v3/content/rpm/packages/ file@bear-4.1-1.noarch.rpm relative_path=camel-0.1-1.noarch.rpm http --form :24817/pulp/api/v3/content/rpm/packages/ file@camel-0.1-1.noarch.rpm http :24817/pulp/api/v3/repositories/rpm/rpm/ name=test http :24817/pulp/api/v3/distributions/rpm/rpm/ name=test base_path=test repository=/pulp/api/v3/repositories/rpm/rpm/4d390079-60ff-4916-b80f-51b6756f2757/ http :24817/pulp/api/v3/repositories/rpm/rpm/4d390079-60ff-4916-b80f-51b6756f2757/modify/ add_content_units:='["/pulp/api/v3/content/rpm/packages/290113b4-0ca2-4bb4-8b5a-725d471d865a/", "/pulp/api/v3/content/rpm/packages/8df08649-2272-4006-8270-eee1d453edd2/"]' http :24817/pulp/api/v3/publications/rpm/rpm/ repository=/pulp/api/v3/repositories/rpm/rpm/4d390079-60ff-4916-b80f-51b6756f2757/Expected behavior When I view the packages I see only one (camel) which is expected:
When I get its checksum, I get the checksum for the package named formerly known as
bear-4.1-1.noarch.rpm
which is also acceptable:However, when I download the primary.xml, I see two entries: one for bear at
Packages/b/bear-4.1-1.noarch.rpm
(which doesn't exist) with a checksum ofceb0f0
and one for camel atPackages/c/camel-0.1-1.noarch.rpm
which does exist but it has a checksum ofc5c34
(which doesn't match the checksum of the actual package at/pulp/content/test/Packages/c/camel-0.1-1.noarch.rpm
).