metal3-io / metal3-dev-env

Metal³ Development Environment
Apache License 2.0
112 stars 118 forks source link

Enhancement suggestion for image prepull #1368

Open NymanRobin opened 6 months ago

NymanRobin commented 6 months ago

When preparing the host for the metal3-dev-env the virtual machine base images are rather big, which leads to the current process being quite fragile to network or process interruptions. This might lead to errors in the configuration since the integrity checks in iamge_prepull.sh are quite loose.

The first problem raises from checking if the image exists if [[ ! -f "${IMAGE_NAME}" ]]; then This only cares if the image exists not about the content at all

What does not help the situation is the check of checksum, since if the checksum does not exist it is generated from the file and file might be corrupt at this point already.

Finally I think the user experience could be elevated by adding a progress bar to this slow downloads so the user is not confused about what is happening with for example this options to wget: --show-progress --progress=bar:force:noscroll Note: This is quite new option for wget so might break on some older machines so might be best to consider a fallback in case wget does not recognize the options

An improvement suggestion would be to download the checksum directly from artifactory each time and comparing with the one of the actual file This is normally done by appending the type of checksum you want to the end of the filepath so something like this: https://artifactory.nordix.org/ui/native/metal3/images/k8s_v1.29.0/UBUNTU_22.04_NODE_IMAGE_K8S_v1.29.0.qcow2.sha256

This did not work even though I can see in the Artifactory UI that the checksum is generated, might be related to repository settings could be investigated further.

Otherwise this could most likely also be achieved with some wget options

tuminoid commented 6 months ago

Is there a case where the file download is corrupt and wget would still return success, or is the failure case that the pre-pulled image exists and is corrupt?

Finally I think the user experience could be elevated by adding a progress bar to this slow downloads so the user is not confused about what is happening with for example this options to wget: --show-progress --progress=bar:force:noscroll Note: This is quite new option for wget so might break on some older machines so might be best to consider a fallback in case wget does not recognize the options

Yes, this option does not work on our Centos variant. All nice things are missing in Centos side :)

Also that option does not look great in logs:

--2024-03-14 07:02:26--  https://artifactory.nordix.org/artifactory/metal3/images/k8s_v1.29.0/CENTOS_9_NODE_IMAGE_K8S_v1.29.0.qcow2
Resolving artifactory.nordix.org (artifactory.nordix.org)... 91.106.198.25
Connecting to artifactory.nordix.org (artifactory.nordix.org)|91.106.198.25|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2270668288 (2.1G) [application/octet-stream]
Saving to: ‘CENTOS_9_NODE_IMAGE_K8S_v1.29.0.qcow2.3’

^MCENTOS_9_NODE_IMAGE   0%[                    ]       0  --.-KB/s               ^MCENTOS_9_NODE_IMAGE   2%[                    ]  54.73M   274MB/s               ^MCENTOS_9_NODE_IMAGE   6%[>                   ] 130.85M   327MB/s               ^MCENTOS_9_NODE_IMAGE   9%[>                   ] 204.76M   341MB/s               ^MCENTOS_9_NODE_IMAGE  12%[=>                  ] 281.48M   352MB/s               ^MCENTOS_9_NODE_IMAGE  16%[==>                 ] 359.76M   360MB/s               ^MCENTOS_9_NODE_IMAGE  20%[===>                ] 437.87M   365MB/s               ^MCENTOS_9_NODE_IMAGE  23%[===>                ] 517.19M   369MB/s               ^MCENTOS_9_NODE_IMAGE  27%[====>               ] 598.85M   374MB/s               ^MCENTOS_9_NODE_IMAGE  31%[=====>              ] 671.87M   373MB/s               ^MCENTOS_9_NODE_IMAGE  33%[=====>              ] 735.64M   368MB/s               ^MCENTOS_9_NODE_IMAGE  36%[======>             ] 799.74M   363MB/s               ^MCENTOS_9_NODE_IMAGE  39%[======>             ] 863.41M   360MB/s               ^MCENTOS_9_NODE_IMAGE  42%[=======>            ] 927.01M   356MB/s               ^MCENTOS_9_NODE_IMAGE  45%[========>           ] 991.03M   354MB/s               ^MCENTOS_9_NODE_IMAGE  49%[========>           ]   1.04G   354MB/s    eta 3s     ^MCENTOS_9_NODE_IMAGE  52%[=========>          ]   1.11G   362MB/s    eta 3s     ^MCENTOS_9_NODE_IMAGE  56%[==========>         ]   1.18G   361MB/s    eta 3s     ^MCENTOS_9_NODE_IMAGE  59%[==========>         ]   1.26G   362MB/s    eta 3s     ^MCENTOS_9_NODE_IMAGE  63%[===========>        ]   1.34G   363MB/s    eta 3s     ^MCENTOS_9_NODE_IMAGE  66%[============>       ]   1.41G   363MB/s    eta 2s     ^MCENTOS_9_NODE_IMAGE  70%[=============>      ]   1.49G   363MB/s    eta 2s     ^MCENTOS_9_NODE_IMAGE  73%[=============>      ]   1.56G   361MB/s    eta 2s     ^MCENTOS_9_NODE_IMAGE  77%[==============>     ]   1.63G   358MB/s    eta 2s     ^MCENTOS_9_NODE_IMAGE  80%[===============>    ]   1.70G   355MB/s    eta 2s     ^MCENTOS_9_NODE_IMAGE  83%[===============>    ]   1.77G   359MB/s    eta 1s     ^MCENTOS_9_NODE_IMAGE  87%[================>   ]   1.85G   362MB/s    eta 1s     ^MCENTOS_9_NODE_IMAGE  90%[=================>  ]   1.92G   366MB/s    eta 1s     ^MCENTOS_9_NODE_IMAGE  94%[=================>  ]   2.00G   372MB/s    eta 1s     ^MCENTOS_9_NODE_IMAGE  97%[==================> ]   2.07G   374MB/s    eta 1s     ^MCENTOS_9_NODE_IMAGE 100%[===================>]   2.11G   377MB/s    in 5.9s   

That said, the current logging is really spammy, printing a literal thousand lines...

This did not work even though I can see in the Artifactory UI that the checksum is generated, might be related to repository settings could be investigated further.

Checking the file listing at https://artifactory.nordix.org/ui/native/metal3/images/k8s_v1.29.0/ does not show any checksum files to be downloaded. We should probably be uploading them with the images themselves.

NymanRobin commented 6 months ago

It is only in a failure case that the corruption can happen and Indeed it does not generate so great in log files

Seeing the checksum in the filebrowser depends on this setting in artifactory: artifactory.ui.hideChecksums

But in the UI view I can at least see it: https://artifactory.nordix.org/ui/repos/tree/General/metal3/images/k8s_v1.29.0/UBUNTU_22.04_NODE_IMAGE_K8S_v1.29.0.qcow2

Rozzii commented 6 months ago

/triage accepted Imo the output of this should be :

metal3-io-bot commented 3 months ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Rozzii commented 3 months ago

/remove-lifecycle stale

Rozzii commented 3 months ago

/remove-lifecycle stale

metal3-io-bot commented 1 week ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

tuminoid commented 1 week ago

/remove-lifecycle stale /lifecycle frozen