packit / packit-service

Packit provided as a service
https://packit.dev
MIT License
34 stars 46 forks source link

Workers occupied by larger repos #2411

Open mfocko opened 2 months ago

mfocko commented 2 months ago

Description

During the problems with our queue on Monday, it's been discovered that the last executed command in both of our long-running workers has been (or replace with different repository):

[2024-04-29 10:26:46,485: DEBUG/ForkPoolWorker-1] task.run_copr_build_handler[d62bca3b-d550-4666-a2c2-13443fe8f130] Popen(['git', 'clone', '-v', '--tags', '--', 'https://github.com/systemd/systemd.git', '/tmp/sandcastle'], cwd=/src, stdin=None, shell=False, universal_newlines=True)

Given the size of the systemd repository from the example and its presence in both of the workers, it is suspected that the clone of the large repository resulted in the queue being “choked” by cloning large repository in the workers.

Since this has been caught as part of the run_copr_build_handler, we do not need the full history, it will be cloned for the build (and potential user-specified actions) in Copr build environment anyways.

⚠️ WARNING ⚠️

We still need the full history for sync-release runs and upstream-koji-build. Though those could be postponed to their respective sandboxes in Sandcastle.

TODO

Sizes of repository

Current command »218 MiB«

/tmp % git clone -v --tags -- https://github.com/systemd/systemd.git
Cloning into 'systemd'...
POST git-upload-pack (175 bytes)
POST git-upload-pack (gzip 19402 to 9771 bytes)
remote: Enumerating objects: 519634, done.
remote: Counting objects: 100% (963/963), done.
remote: Compressing objects: 100% (570/570), done.
remote: Total 519634 (delta 512), reused 648 (delta 366), pack-reused 518671
Receiving objects: 100% (519634/519634), 218.39 MiB | 2.89 MiB/s, done.
Resolving deltas: 100% (407856/407856), done.

Only cloning the latest commit »16 MiB«

/tmp % git clone -v --tags --depth=1 -- https://github.com/systemd/systemd.git
Cloning into 'systemd'...
POST git-upload-pack (175 bytes)
POST git-upload-pack (229 bytes)
remote: Enumerating objects: 6383, done.
remote: Counting objects: 100% (6383/6383), done.
remote: Compressing objects: 100% (5196/5196), done.
remote: Total 6383 (delta 1405), reused 2941 (delta 893), pack-reused 0
Receiving objects: 100% (6383/6383), 16.03 MiB | 11.69 MiB/s, done.
Resolving deltas: 100% (1405/1405), done.
nforro commented 2 months ago

Since this has been caught as part of the run_copr_build_handler, we do not need the full history

We are cloning the repo there only to get the config, right? Then making a shallow clone makes complete sense, I just don't think passing --tags is necessary (it would pull only the tags pointing to the cloned commit anyway).

There is also another option, treeless clone:

$ git clone -v --tags --filter=tree:0 -- https://github.com/systemd/systemd.git
Cloning into 'systemd'...
POST git-upload-pack (175 bytes)
POST git-upload-pack (gzip 19369 to 9522 bytes)
remote: Enumerating objects: 77751, done.
remote: Counting objects: 100% (183/183), done.
remote: Compressing objects: 100% (183/183), done.
remote: Total 77751 (delta 1), reused 86 (delta 0), pack-reused 77568
Receiving objects: 100% (77751/77751), 23.53 MiB | 19.67 MiB/s, done.
Resolving deltas: 100% (492/492), done.
remote: Enumerating objects: 467, done.
remote: Counting objects: 100% (185/185), done.
remote: Compressing objects: 100% (167/167), done.
remote: Total 467 (delta 7), reused 70 (delta 4), pack-reused 282
Receiving objects: 100% (467/467), 197.63 KiB | 1.69 MiB/s, done.
Resolving deltas: 100% (8/8), done.
remote: Enumerating objects: 5915, done.
remote: Counting objects: 100% (4053/4053), done.
remote: Compressing objects: 100% (3445/3445), done.
remote: Total 5915 (delta 1099), reused 708 (delta 605), pack-reused 1862
Receiving objects: 100% (5915/5915), 15.88 MiB | 11.78 MiB/s, done.
Resolving deltas: 100% (1380/1380), done.
Updating files: 100% (6130/6130), done.

That's about 39 MiB in total and it could in theory work in place of full clones.

lachmanfrantisek commented 2 months ago

A config option with a default to clone just the last commit might be a good approach.

lbarcziova commented 1 month ago

We are cloning the repo there only to get the config, right?

That's actually not true, we are getting the config via API earlier. The cloning happens anytime LocalProject is initialised, so there is room for improvement as well (probably related to #1955, EDIT: also https://github.com/packit/packit/issues/1581).