pulp / pulp_rpm

RPM support for Pulp Platform
https://docs.pulpproject.org/pulp_rpm/
GNU General Public License v2.0
47 stars 123 forks source link

"Exclusion" option for some packages during the repository synchronisation #3469

Open yunustatli opened 3 months ago

yunustatli commented 3 months ago

Is your feature request related to a problem? Please describe. Our security team does not allow us to synchronize some packages like "aircrack, pnscan, masscan" from an upstream source. We also have packet filtering for these packages at the firewall level. Repo synchronization is not possible in this case and giving error.

Describe the solution you'd like A feature to exclude package/packages would help us and also those who have the same problem (I have heard this problem at least from two other colleagues who work in other companies).

At the moment there is the option "Ignore SRPMs" for the rpm-based repositories. As I understand it, the mechanism already exists and should be extended for some specific packages (rpm and deb).

So a new parameter "exclude_packages" can solve the problem.

Describe alternatives you've considered

Additional context

dralley commented 3 months ago

The main issue with adding this feature is that it's suuuper easy to accidentally misuse - you would basically need to ensure that only "leaf" packages that nothing else depends on are blacklisted, but you'd need to do it manually since it would be too expensive to calculate on every sync. And there's no way to do that generically, it'd be a per-plugin feature.

SRPMs don't have that particular issue.

With that said I do hear your problem. Perhaps one workaround would be to use on-demand syncs to avoid downloading the packages during the sync itself?

ggainey commented 3 months ago

If/when we decide to implement, this is prob the place.

Other thoughts:

yunustatli commented 3 months ago

Thank you very much for the quick responses, I really appreciate it.

I have an idea, even if it may seem a little stupid for you. Please forgive me if it seems naive, as I am neither a pulp developer nor a developer at all.

Background:

Currently, after synchronizing repositories, we have the option to remove all desired packages and publish the content views without these packages. As a result, these packages are no longer present in the repository and are also excluded from the repository metadata. Of course, these packages will be synchronized again with the next synchronization because they are missing in the repository.

Perhaps we could utilize the this functionality of Pulp, even though I am not familiar with its inner workings, to remove the packages. If Pulp were to retrieve the exclude package list from the repository settings (when the parameter is not null) and remove the packages before initiating the repository synchronization, it might be a viable solution. However, I am unsure if this approach would be feasible or effective.

I understand there may be concerns regarding dependencies, but as @ggainey mentioned, users can be informed about potential dependency issues in the documentation.

Thank you once again for your time and consideration.

ggainey commented 3 months ago

The typical workflow in Pulp is exactly this - you sync the upstream to provide the set of content that's available to the Pulp Admin, and then curate that content by deciding what is/isn't Allowed for your users, only Distributing (making public) the curated version(s).

You can accomplish this in a couple of ways. One is to sync a repo, then copy the content-you-want to a second repo and always Distribute the most-recent Publication in that second repo. Another is to sync a repo, then remove content from the resulting RepositoryVersion to make a new RepositoryVersion, and Distribute that specific version.

The problem you described initially, tho, causes that initial sync-from-remote to fail - which stops this cold.

dralley commented 3 months ago

It would not however stop cold if you did an on-demand sync first, because that skips the package download process

ggainey commented 3 months ago

It would not however stop cold if you did an on-demand sync first, because that skips the package download process

Ah! That's an outstanding observation, good point Daniel.

quba42 commented 3 months ago

Also seems related: https://github.com/pulp/pulp_rpm/issues/2713