pulp / pulpcore

Pulp 3 pulpcore package https://pypi.org/project/pulpcore/
GNU General Public License v2.0
284 stars 111 forks source link

As a user a can manage labels on the content #3338

Open ipanova opened 1 year ago

ipanova commented 1 year ago

Is your feature request related to a problem? Please describe. Scope (these are not all usecases): As a user I can filter content by specific labels As a user I can set labels on the content I am uploading to a destination repository As a user I can set labels on the content which is already present in a repo As a user I can update/remove labels on the content which is already present in a repo As a user I can preserve labels by specifying a flag –preserve-labels=True when copying content between repos ( by default content is copied without labels)

Additional context We already have labels on repositories, remotes,distributions, etc Let's add labels to the content too. Labels will have k8s style labels.

Questions:

  1. When labeling content should labels be shared across all the repos/repo versions it is present in?Example: Alice labels rpm.foo as "release: rhel8" this label is applied to every repo Dev, Test, Prod. (Not sure how this will work with RBAC, what if user Alice who labels rpm.foo does not have any RBAC to the repo Prod but just Dev and Test)
  2. When labeling content does it make sense to be able to label content differently based on the repo/repo version it belongs to aka content labels are not shared across repo/repo versions/ Example: Alice labels rpm.foo as {"release: rhel8", "env": dev"} in repo Dev. This label is applied to only repo Dev. Alice labels rpm.foo as {"env: test"} in repo Test.
  3. Should the content be searched by labels per repo or across repos. I guess this will depend on 1 or 2

Additional info: https://www.jfrog.com/confluence/display/JFROG/Property+Sets https://www.jfrog.com/confluence/display/JFROG/Artifactory+REST+API#ArtifactoryRESTAPI-RetrieveArtifact https://www.jfrog.com/confluence/display/JFROG/Artifactory+REST+API#ArtifactoryRESTAPI-PropertySearch

mdellweg commented 1 year ago

I do not think, 1. fit's the pulp philosophy. What if we make ContentLabel a new content-type?

ipanova commented 1 year ago

@mdellweg problem with this approach is that it adds additional overhead when it comes to ContentLabels management. You will need to safeguard that labels are removed if content is removed from a repo( labels cannot stay in the repo without its content). Then there is question whether labels should (or not) be automatically copied over between repos when content is copied.

ipanova commented 1 year ago

One of the ideas was to use RepositoryContent model to make the labels content and repo aware. @dralley you remember the details?

FrostyX commented 1 month ago

I thought that I needed labels for artifacts but labels for content might work for me as well. Let me elaborate on my use-case.

When a Copr build finishes and I want to upload the artifacts, I do:

  1. Make sure a correct repository and distribution exists
  2. For all produced RPM packages:
    1. Upload the RPM package as an artifact
    2. Create a content for the artifact (effectively adding it to the repository)
  3. Create a new publication for the repository
  4. Update my distribution to point to the new publication

Somewhere along the way, I would like to label the artifacts/content with some additional information such as Copr build ID.

For some other processes, I have a requirement to work with RPM packages that came from a specific Copr build. For example, remove them once they are no longer needed. In that case I do:

  1. Get the latest repository version (I'd prefer to get a repository but a repository version is needed for another step)
  2. List all packages for that repository version
  3. I need to filter only the packages labeled copr_build_id=1234 but this is not possible to do yet. I think this RFE should solve this. Also, for performance reasons it would be better if this wasn't a separate step but rather part of the previous step.
  4. Remove the packages

It is very well possible that I do something that doesn't make sense or that I don't really know what I want. Please let me know if you need more information about something.

ggainey commented 1 month ago

Notes from team-discussion around this issue , 2024-08-13:

Labelling Content collides, in multi-user Pulp installs, with RBAC and Content-deduplication.

Labelling the RepositoryVersion says "everything added in this RV is labelled 'foo'". This collides with "deleting repo-versions collapses 'added' into the remaining repo-version"

Labelling at the RepositoryContent model says "This Content in This Repository is labelled 'foo''. This "feels like" it's the usecase COPR (specifically) needs. It does require the user to provide both content- and repository-info.

Meanwhile, pulp_ansible has CollectionMarks to accomplish a similar goal - maybe this approach should be generalized into pulpcore?

Discussion continues - but the priority is "agree on an approach and implement Soon"

ggainey commented 1 month ago

@FrostyX just a quick note - your "For all produced RPM packages: " can be reduced to "upload Package to Repository" as one API call - you don't have to create Artifact/Turn into Package/Assign to Repo" as separate steps., specifying "file" and "repository".

mdellweg commented 1 month ago

For completeness sake: The marks mentioned are a type of content themselves, allowing to add attributes to content depending on the repository version. e.g. You can mark a piece of content "deprecated" starting in some repository version. The big question is how sticky are the marks, when it comes to copying content from one repository to another.

daviddavis commented 3 weeks ago

We would like to be able to tag content when it's uploaded/created. Our use case is that we need to keep track of which team in our org uploaded the content initially, and then filter that content later on. I think being able to tag content within the context of a repo would also be useful although we don't have an immediate need for it.

daviddavis commented 3 weeks ago

Labelling Content collides, in multi-user Pulp installs, with RBAC and Content-deduplication.

  • example: Two users upload the same (by sha256) Content into their "own" Repositories. Both can label it - and both will "see" the other user's label (and could unlabel it!)

We only have a single user installation of Pulp but I wonder if it'd be possible for a user to only see his/her own labels?

ggainey commented 3 weeks ago

We only have a single user installation of Pulp but I wonder if it'd be possible for a user to only see his/her own labels?

Not if the label(s) are on the Content itself - because Content is deduplicated/shared, so you'd see any labels anybody who has access to a specific content-id applied, and not know who had applied them.

daviddavis commented 3 weeks ago

Right, I was imagining that Pulp would have to keep track of who created the label.

daviddavis commented 3 weeks ago

I haven't used Pulp's RBAC so forgive my ignorance, but would another option be to only allow admins (or certain roles) to edit/read/etc content labels? Or perhaps have specific permissions for users to read/edit/etc content labels globally?