Open quba42 opened 1 year ago
Note: I am not actually sure about rule 5 above. I suppose it could be legal to have a component, that does not actually include any packages in its package indices. Not very useful, but probably legal.
I don't agree with this one:
For each
ReleaseComponent
there should be at least onePackageReleaseComponent
referncing it.
It should be possible for the release and/or component to be defined but not yet have any content. As for validity checking I think it's a good idea but I'm not sure there's a perfect time to do it. You either have to check validity at the content modification time / repo version creation time (which is when I would prefer) which gives early feedback to the user but also forces them to make all their content modification changes at once in one request. Or you do it at Publication time, which makes publication an operation that can unexpectedly fail, which is not great.
But other then that I agree with everything here. I think ensuring validity is a good first step, but there's probably also some extra that could be done around automating validity. In particular I'm thinking about PackageReleaseComponent
. I'm of the opinion that PackageReleaseComponent
should never have been exposed over the API, and instead if it was necessary at all it should have just been an internal implementation detail.
Users never care about the existence / non-existence of a PackageReleaseComponent
, they care about whether or not their package has been successfully added to / removed from the ReleaseComponent
. The fact that another object has to be created to shoehorn this relationship into the pulpcore Content
model is not really relevant to them. As you say above, it makes no sense to have a PRC
if both the Package
and the ReleaseComponent
are not also in the RepoVersion
.
We have wrapped our interaction with the pulp_deb
API in a manager class who's only job is to ensure that our package adds / deletes also correctly create / clean up PRC
objects. You could obsolete this class (and I think provide a more natural experience to other API users) if you added logic/API to allow people to ignore PRC
objects entirely:
Package
to a ReleaseComponent
, the PRC
object is created automatically.Package
from a ReleaseComponent
, the PRC
object is automatically removed.Package
from a ReleaseComponent
and there are no other PRC
references to that package, the package itself is removed.It may also make sense to extend that logic further, for example if the user removes a Release
then also remove all ReleaseComponents
, PackageReleaseComponents
, and Packages
that are not referenced elsewhere. I'm not convinced that's a good idea, because at that point there's no great way to "undo" the action if the user changes their mind. PackageReleaseComponents
are safe to automate this way precisely because they are the bottom of the dependency graph, and adding a Package
to a ReleaseComponent
is the inverse of removing it.
One concrete next step I would like to see is that removing a Package from a repo, will automatically remove any associated PackageReleaseComponents. A dangling PackageReleaseComponents is simply meaningless, and can cause orphan cleanup to fail. See #865
Another thing we might consider: Provide a DB cleanup script to clean up dangling PackageReleaseComponents in existing repo versions. The idea is to provide a possibility to clean up if they are causing orphan cleanup to fail.
Unless all users do is sync, there are currently numerous ways to create repository versions that have inconsistent structure and metadata content.
Metadata content:
Release
Structure content:
ReleaseComponent
,ReleaseArchitecture
,PackageReleaseComponent
What do we mean by consistent?
The following rules are necessary and sufficient for a given repo version to be described as structurally consistent/sound:
Package
, there should be at least onePackageReleaseComponent
referencing that package.PackageReleaseComponent
the referencedReleaseComponent
should be present in the repo version.PackageReleaseComponent
the referencedPackage
should be present in the repo version.ReleaseComponent
there should be exactly oneRelease
with the samedistribution
(not strictly required sinceRelease
just stores optional release file fields).For eachReleaseComponent
there should be at least onePackageReleaseComponent
referncing it.architecture
associated with someRelease
via the above association chain, there should be aReleaseArchitecture
with that samedistribution
as theRelease
and the samearchitecture
as thePackage
.Release
, there should be at least oneReleaseComponent
with the samedistribution
.Example problems caused by "inconsistent" repo versions
https://github.com/pulp/pulp_deb/issues/777
Steps we are taking to ensure consistency
Release
have repo uniqueness constraint on thedistribution
field, so we guarantee at most oneRelease
perdistribution
.Ideas to improve consistency further