refactor: use `mutate.Extract` to implement `Squash`

tri-adam commented 11 months ago

Closes #13

dtrudg commented 10 months ago

Testing this against a large container with many layers - which is somewhat typical for vendor-optimized AI stacks:

docker://nvcr.io/nvidia/tensorflow:22.01-tf2-py3
6.85GiB Compressed Size
42 layers

The test machine is an 8-core (16 thread) AMD Ryzen 7 5700U with WD Black SN750 NVMe SSD.

singularity --pull is used to run the OCI -> (OCI-)SIF conversion. The OCI blobs have been pre-cached, so that download speeds aren't in play.

There is very little difference in the wall-clock elapsed run-time between the stereoscope method and mutate.extract.

14min 20.47s for stereoscope -> sqfstar
14min 35.97s for mutate.Extract -> sqfstar
4min 9.07s for singularity's native umoci->mksquashfs flow

There is a noticeable difference between the max resident memory usage:

3977 MiB for stereoscope -> sqfstar
123 MiB for mutate.Extract -> sqfstar
5663 MiB for singularity's native umoci->mksquashfs flow

The low memory usage of the mutate.Extract approach may be beneficial if creating OCI-SIFs from large GPU images on e.g. RAM constrained ARM+GPU development boards.

Note that the memory figure for the native umoci->mksquashfs flow is high due to the fact that mksquashfs aggressively uses free memory to speed up squashfs creation. It will function in memory contstrained environments, but slower.

Wall-clock time for stereoscope and mutate.Extract is quite dependent on single core CPU performance. The singularity process is pegged at ~100% CPU usage on a single core for the majority of the time. I/O is not a constraint here.

Run Timing

Before this PR (stereoscope):

$ /bin/time singularity pull --oci docker://nvcr.io/nvidia/tensorflow:22.01-tf2-py3
INFO:    Converting OCI image to OCI-SIF format
INFO:    Squashing image to single layer
INFO:    Writing OCI-SIF image
INFO:    Cleaning up.
2289.44user 77.45system 14:20.47elapsed 275%CPU (0avgtext+0avgdata 4072412maxresident)k
0inputs+12753248outputs (1major+3632630minor)pagefaults 0swaps

With this PR (mutate.Extract):

$ /bin/time singularity pull --oci docker://nvcr.io/nvidia/tensorflow:22.01-tf2-py3
INFO:    Converting OCI image to OCI-SIF format
INFO:    Squashing image to single layer
INFO:    Writing OCI-SIF image
INFO:    Cleaning up.
2236.75user 64.29system 14:35.97elapsed 262%CPU (0avgtext+0avgdata 126136maxresident)k
0inputs+12753048outputs (0major+97481minor)pagefaults 0swaps

For comparison... Singularity native mode umoci extraction->mkquashfs

/bin/time singularity pull docker://nvcr.io/nvidia/tensorflow:22.01-tf2-py3
...
INFO:    Creating SIF file...
1691.33user 42.11system 4:09.07elapsed 695%CPU (0avgtext+0avgdata 5799184maxresident)k
392inputs+12755512outputs (3major+1527464minor)pagefaults 0swaps

dtrudg commented 10 months ago

Note that this does fix an issue with extracting some images with stereoscope:

With stereoscope:

$ singularity pull --force --oci docker://nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04
...
FATAL:   While making image from oci registry: error fetching image to cache: while creating OCI-SIF: while squashing image: cycle during symlink resolution

With mutate.Extract (this PR):

$ singularity pull --force --oci docker://nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04
2023/08/11 11:22:54 Unsolicited response received on idle HTTP channel starting with "0\r\n\r\n"; err=<nil>
INFO:    Converting OCI image to OCI-SIF format
INFO:    Squashing image to single layer
INFO:    Writing OCI-SIF image
INFO:    Cleaning up.

I am :+1: on this PR because of this :-)

sylabs / oci-tools

refactor: use `mutate.Extract` to implement `Squash` #15

Run Timing