pulp / pulp-cli

https://docs.pulpproject.org/pulp_cli/
GNU General Public License v2.0
35 stars 41 forks source link

Speed up uploads by uploading chunks in parallel #263

Open daviddavis opened 3 years ago

daviddavis commented 3 years ago

Right now chunk uploads happen sequentially but we could speed them up running them in parallel.

lubosmj commented 1 month ago

There was another request from the services team to make the uploading happen in parallel. I am a bit sceptical about this idea because we usually saturate the uplink with uploading in serial. However, there might be data centers where this saturation cannot be reached because of reverse-proxy configurations disallowing a larger chunk size.

I am going to introduce an option to the upload/artifact command that enables parallelism (e.g., --parallel).

daviddavis commented 1 month ago

You may want to make the number of parallel threads or processes configurable somehow as I imagine different Pulp instances could support different levels of throughput.

mdellweg commented 1 month ago

Doing uploads in parallel requires adding some notion of parallel execution into the cli codebase (that has been unprecedented). I would like to see some clear statements with numbers before gauging the need for adding this kind of complexity to the project. I.e. I cannot say, whether pulp-glue is thread safe. Do we want to rewrite it in async python, using aiohttp instead of requests?

lubosmj commented 1 month ago

I made a couple of experiments locally (using oci_env) and here are the results. It appears that uploading chunks in parallel improved the performance by 50%. I did not spend much time writing quality code or using any optimization techniques besides splitting an uploaded file into 4 chunks and then uploading those chunks in sub-chunks in parallel.


Test 1: With creating an artifact (db reset between runs, uploading one commit in a tarball, 717.7MB, 10MB chunks)

SERIAL (current implementation):
(venv) [lmjachky@lmjachky-thinkpadt14gen4 services]$ time pulp ostree repository import-all --name fedora-iot --file a9598e5a-1f0c-48b8-abda-14915a4d051a-commit.tar --repository_name repo --chunk-size 10MB
........................................................................Upload complete.
Creating artifact.
Started background task /pulp/api/v3/tasks/0190a1b6-19eb-7fe1-9b36-c2faf44e516e/
.....Done.

real    0m55.106s
user    0m13.641s

PARALLEL (4 processes for chunked uploading):
(venv) [lmjachky@lmjachky-thinkpadt14gen4 services]$ time pulp ostree repository import-all-parallel --name fedora-iot --file a9598e5a-1f0c-48b8-abda-14915a4d051a-commit.tar --repository_name repo --chunk-size 10MB
.....................................................................Upload complete.
.Upload complete.
.Upload complete.
.Upload complete.
Creating artifact.
Started background task /pulp/api/v3/tasks/0190a1b7-caae-759b-b9cc-81da6ca042b8/
.....Done.

real    0m30.569s
user    0m18.488s

Test 2: Without creating an artifact (db reset between runs, uploading one commit in a tarball, 717.7MB, 10MB chunks)

SERIAL (current implementation):
(venv) [lmjachky@lmjachky-thinkpadt14gen4 services]$ time pulp ostree repository import-all --name fedora-iot --file a9598e5a-1f0c-48b8-abda-14915a4d051a-commit.tar --repository_name repo --chunk-size 10MB
........................................................................Upload complete.

real    0m49.346s
user    0m14.368s

PARALLEL (4 processes for chunked uploading):
(venv) [lmjachky@lmjachky-thinkpadt14gen4 services]$ time pulp ostree repository import-all-parallel --name fedora-iot --file a9598e5a-1f0c-48b8-abda-14915a4d051a-commit.tar --repository_name repo --chunk-size 10MB
.....................................................................Upload complete.
.Upload complete.
.Upload complete.
.Upload complete.

real    0m22.449s
user    0m16.966s

Changes made to pulp-glue: https://gist.github.com/lubosmj/1d736226c1816fb019430e7fb78cdd55. Changes made to pulp-cli-ostree: https://gist.github.com/lubosmj/3bc14338713ab9a55343359ff49829b1. I used processes (https://pypi.org/project/multiprocess/ for easier function pickling) to perform the action.

lubosmj commented 1 month ago

TCP congestion control is designed to manage the flow of data to prevent network congestion and ensure fairness among multiple connections. However, this mechanism primarily operates on a per-connection basis. This is what we are trying to bypass by uploading in parallel, right? Multiple TCP connections from a single host can then easily saturate the uplink.

lubosmj commented 1 month ago

The following experiment supports that theory. When uploading commits to staging, I am getting amazing results. Almost 4-times better performance, seeing the speed of uploads and used uplink.


Test 1: Serial uploading (1MB chunk, 1 TCP connection, 1.3GB in total)

(venv) [lmjachky@lmjachky-thinkpadt14gen4 services]$ time pulp ostree repository import-all --name rhivos-test-non-parallel --file "auto-osbuild-aws-autosd9-cki-ostree-x86_64-1368897263.017a82ff.repo.tar" --repository_name "auto-osbuild-aws-autosd9-cki-ostree-x86_64-1368897263.017a82ff.repo" --chunk-size 1MB
Uploading file auto-osbuild-aws-autosd9-cki-ostree-x86_64-1368897263.017a82ff.repo.tar
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Upload complete.
Creating artifact.
Started background task /api/pulp/default/api/v3/tasks/0190a2cd-235c-7a9f-adf2-10aa8529519d/
..........................................................................................................................................................................................Done.
Started background task /api/pulp/default/api/v3/tasks/0190a2d1-1cd9-76aa-8c75-07f7ef5f418a/
...............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Done.

real    42m34.483s
user    0m41.393s

image

Test 2: Parallel uploading (1MB chunk, 4 parallel processes, 4 TCP connections, 1.4GB in total)

(venv) [lmjachky@lmjachky-thinkpadt14gen4 services]$ time pulp ostree repository import-all --name rhivos-test-parallel --file "auto-osbuild-qemu-autosd9-qa-ostree-x86_64-1368897263.017a82ff.repo.tar" --repository_name "auto-osbuild-qemu-autosd9-qa-ostree-x86_64-1368897263.017a82ff.repo" --chunk-size 1MB --parallel
Uploading file auto-osbuild-qemu-autosd9-qa-ostree-x86_64-1368897263.017a82ff.repo.tar
.....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Upload complete.
........Upload complete.
.......Upload complete.
....Upload complete.
Creating artifact.
Started background task /api/pulp/default/api/v3/tasks/0190a2f1-332f-7c86-8f4c-735a63265275/
..............................................................................................................................................................................Done.
Started background task /api/pulp/default/api/v3/tasks/0190a2f4-fb30-7bb8-81e9-456a7ebc16f2/
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
ERROR, SOMEONE RESTARTED GATEWAY!!! BUT WE DO NOT CARE!
requests.exceptions.HTTPError: 503 Server Error: Service Unavailable for url: https://XXXXXXXX.com/api/pulp/default/api/v3/tasks/0190a2f4-fb30-7bb8-81e9-456a7ebc16f2/

real    26m27.961s
user    0m43.025s

image


Tested with the following changes applied on the respective main branches: https://github.com/lubosmj/pulp-cli/commit/8d573811afdada48b4d64623820204f13871f62f, https://github.com/lubosmj/pulp-cli-ostree/commit/0c4f3aec91edf9e11e7e3670eae228cacf3e50d8. OSTree commits were taken from https://autosd.sig.centos.org/AutoSD-9/nightly/ostree-repos/.