Switch to building stage repository by combining production and diff …

ehelms commented 11 months ago

…of Copr

When the version is nightly, this will not pull from production but instead treat what is in Copr as the source of truth. At the end, the output for a versioned repository is a list of unsigned packages.

There is a lot of change here, and I can likely do some re-factoring now or after merge. I wanted to get a working version of the concept available.

Here is how this generates the stage repository.

For nightly:

Copy all RPMs from Copr for the given repository to local
Filter the downloaded RPMs through comps file, removing anything not in the comps file
Runs createrepo

This is done for RPM and SRPMs. For this workflow, Jenkins will run this script and push the RPMs via rsync to stagingyum.theforeman.org.

For releases:

Copy all RPMs from production (yum.theforeman.org) to a local repository
Run repodiff to identify new packages in Copr for the release
Download the new packages from Copr
Filter the downloaded RPMs through comps file, removing anything not in the comps file
Runs createrepo
Returns as stdout the list of unsigned RPMs and SRPMs

For this workflow, the release engineer will run this script, perform a signing of unsigned packages and push the RPMs via rsync to stagingyum.theforeman.org.

The idea is then that either one of these options will happen (via a follow up PR):

1) New script will take as input the list of unsigned RPMs and sign them 2) New script will calculate the unsigned RPMs with the location of the repository as input and sign them

ehelms commented 11 months ago

I decided to split this into:

1) Build stage repository with script and new method described in original comment 2) New script that for a given directory will list unsigned packages 3) New script that calls list unsigned packages and then feeds them into the sign_rpms script

evgeni commented 11 months ago

Why do you need to "Copy all RPMs from production (yum.theforeman.org) to a local repository"? For repodiff, you can just point it at the existing live repo instead.

It it so that you can generate "composed" repo (old plus new) locally, including module metadata? (Today, we push the diff only and the remote regenerates the repodata)

ehelms commented 11 months ago

Why do you need to "Copy all RPMs from production (yum.theforeman.org) to a local repository"? For repodiff, you can just point it at the existing live repo instead.

It it so that you can generate "composed" repo (old plus new) locally, including module metadata? (Today, we push the diff only and the remote regenerates the repodata)

Yes. This was in service of:

the ability to generate a full stage repository locally
keep web01 dumb
generate module metadata outside of web01

Thinking about it more I could optimize and have support for that with an option to generate the whole thing locally but require web01 to be a tad smarter than it is today.

Optimized Workflow

For a release:

Calculate repodiff between production and Copr
Download identified packages from Copr
Filter with comps
Sign packages
Copy signed packages to web01
generate module metadata on web01
update repo metadata

For nightly:

Download all packages from Copr
Filter with comps
Copy all packages to stage
generate module metadata on web01
Createrepo

evgeni commented 11 months ago

I think I am cool with the current flow (and generally in favor of dumbing down web01). Just needed to have a clearer picture.

Does downloading the existing RPMs from prod keep their filesystem timestamps, or would these be updated during rsync back to web01? If possible, I'd try to retain them, so people have a clearer picture when things changed.

ehelms commented 11 months ago

I think I am cool with the current flow (and generally in favor of dumbing down web01). Just needed to have a clearer picture.

Does downloading the existing RPMs from prod keep their filesystem timestamps, or would these be updated during rsync back to web01? If possible, I'd try to retain them, so people have a clearer picture when things changed.

That will require some testing.

evgeni commented 11 months ago

I think I am cool with the current flow (and generally in favor of dumbing down web01). Just needed to have a clearer picture. Does downloading the existing RPMs from prod keep their filesystem timestamps, or would these be updated during rsync back to web01? If possible, I'd try to retain them, so people have a clearer picture when things changed.

That will require some testing.

Tested and test failed. The following code works tho:

import os
from email.utils import parsedate_to_datetime
from urllib.request import urlretrieve
filename, headers = urlretrieve(…)
if 'Last-Modified' in headers:
    modification_ts = parsedate_to_datetime(headers['Last-Modified']).timestamp()
    os.utime(filename, (modification_ts, modification_ts))

ehelms commented 10 months ago

import os
from email.utils import parsedate_to_datetime
from urllib.request import urlretrieve
filename, headers = urlretrieve(…)
if 'Last-Modified' in headers:
    modification_ts = parsedate_to_datetime(headers['Last-Modified']).timestamp()
    os.utime(filename, (modification_ts, modification_ts))

Do I understand correctly:

Your suggestion is to swap away from reposync to a method that downloads the RPMs individually and using this code chunk to do so in order to maintain the modification date.

evgeni commented 10 months ago

Nah, just make me read the code better 🙈 The part that fetches prod repo is using reposync, not urlretrieve, that is used only for "new" stuff coming from copr. Back to square one evgeni.

ehelms commented 10 months ago

Nah, just make me read the code better 🙈

Does the new structure help at all?

ehelms commented 10 months ago

Please also modify this as needed:

This is handled over here -- https://github.com/theforeman/theforeman-rel-eng/pull/285. I do not want to update the procedures until all the infrastructure is in place.

ehelms commented 10 months ago

@evgeni I was able to add a partial optimization with filtering on download. The reason we cannot do a complete is repodiff returns the full NEVRA, e.g.

Added rubygem-pg-debuginfo-1.5.3-1.el8

And comps contains just the name rubygem-pg so we can at best do a token in match of the name.

evgeni commented 10 months ago

You can split NEVRAs like this:

  package, _version, _release = nevra.rsplit('-', 2)

ehelms commented 10 months ago

I noticed that rpmsign does seem to help prevent us from re-signing an already signed package:

warning: tmp/foreman/3.8/el8/source/rubygem-unf_ext-0.0.8.2-1.el8.src.rpm already contains identical signature, skipping

ehelms commented 10 months ago

Updated with some included README workflow layout. This can be merged prior to (https://github.com/theforeman/foreman-infra/pull/1948) for nightly, but https://github.com/theforeman/foreman-infra/pull/1948 must be merged prior to this if I am to test this for releases and rsyncing to stagingyum.

evgeni commented 10 months ago

One thing that I realized while reviewing https://github.com/theforeman/foreman-infra/pull/1948: Today, nothing cleans up ~yumrepostage/rsync_cache and the script that copies over the cache to the final destination doesn't use --delete. This is fine™, as we're aiming at a "merged" workflow for the release repos anyway, but it also means that the nightly repo might grow and not get cleanup (at least that's how I read the code). We could add a cleanup step once things get copied to the "final destination", but then the rsync from the release engineer workstation will be longer (as it will have to transfer the already signed files again).

Shouldn't block this PR, but something to keep an eye on.

Edit (after reading man rsync(1) for the tenth time): this PR uses --delete-after which implies --delete, so the cache will be cleaned up but the caller.

All good.

theforeman / theforeman-rel-eng

Switch to building stage repository by combining production and diff … #280

Optimized Workflow