Closed tri-adam closed 10 months ago
Testing this against a large container with many layers - which is somewhat typical for vendor-optimized AI stacks:
docker://nvcr.io/nvidia/tensorflow:22.01-tf2-py3
6.85GiB Compressed Size
42 layers
The test machine is an 8-core (16 thread) AMD Ryzen 7 5700U with WD Black SN750 NVMe SSD.
singularity --pull
is used to run the OCI -> (OCI-)SIF conversion. The OCI blobs have been pre-cached, so that download speeds aren't in play.
There is very little difference in the wall-clock elapsed run-time between the stereoscope method and mutate.extract.
There is a noticeable difference between the max resident memory usage:
The low memory usage of the mutate.Extract approach may be beneficial if creating OCI-SIFs from large GPU images on e.g. RAM constrained ARM+GPU development boards.
Note that the memory figure for the native umoci->mksquashfs flow is high due to the fact that mksquashfs aggressively uses free memory to speed up squashfs creation. It will function in memory contstrained environments, but slower.
Wall-clock time for stereoscope and mutate.Extract is quite dependent on single core CPU performance. The singularity
process is pegged at ~100% CPU usage on a single core for the majority of the time. I/O is not a constraint here.
Before this PR (stereoscope):
$ /bin/time singularity pull --oci docker://nvcr.io/nvidia/tensorflow:22.01-tf2-py3
INFO: Converting OCI image to OCI-SIF format
INFO: Squashing image to single layer
INFO: Writing OCI-SIF image
INFO: Cleaning up.
2289.44user 77.45system 14:20.47elapsed 275%CPU (0avgtext+0avgdata 4072412maxresident)k
0inputs+12753248outputs (1major+3632630minor)pagefaults 0swaps
With this PR (mutate.Extract):
$ /bin/time singularity pull --oci docker://nvcr.io/nvidia/tensorflow:22.01-tf2-py3
INFO: Converting OCI image to OCI-SIF format
INFO: Squashing image to single layer
INFO: Writing OCI-SIF image
INFO: Cleaning up.
2236.75user 64.29system 14:35.97elapsed 262%CPU (0avgtext+0avgdata 126136maxresident)k
0inputs+12753048outputs (0major+97481minor)pagefaults 0swaps
For comparison... Singularity native mode umoci extraction->mkquashfs
/bin/time singularity pull docker://nvcr.io/nvidia/tensorflow:22.01-tf2-py3
...
INFO: Creating SIF file...
1691.33user 42.11system 4:09.07elapsed 695%CPU (0avgtext+0avgdata 5799184maxresident)k
392inputs+12755512outputs (3major+1527464minor)pagefaults 0swaps
Note that this does fix an issue with extracting some images with stereoscope:
With stereoscope:
$ singularity pull --force --oci docker://nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04
...
FATAL: While making image from oci registry: error fetching image to cache: while creating OCI-SIF: while squashing image: cycle during symlink resolution
With mutate.Extract (this PR):
$ singularity pull --force --oci docker://nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04
2023/08/11 11:22:54 Unsolicited response received on idle HTTP channel starting with "0\r\n\r\n"; err=<nil>
INFO: Converting OCI image to OCI-SIF format
INFO: Squashing image to single layer
INFO: Writing OCI-SIF image
INFO: Cleaning up.
I am :+1: on this PR because of this :-)
Closes #13