Open NymanRobin opened 6 months ago
Is there a case where the file download is corrupt and wget would still return success, or is the failure case that the pre-pulled image exists and is corrupt?
Finally I think the user experience could be elevated by adding a progress bar to this slow downloads so the user is not confused about what is happening with for example this options to wget: --show-progress --progress=bar:force:noscroll Note: This is quite new option for wget so might break on some older machines so might be best to consider a fallback in case wget does not recognize the options
Yes, this option does not work on our Centos variant. All nice things are missing in Centos side :)
Also that option does not look great in logs:
--2024-03-14 07:02:26-- https://artifactory.nordix.org/artifactory/metal3/images/k8s_v1.29.0/CENTOS_9_NODE_IMAGE_K8S_v1.29.0.qcow2
Resolving artifactory.nordix.org (artifactory.nordix.org)... 91.106.198.25
Connecting to artifactory.nordix.org (artifactory.nordix.org)|91.106.198.25|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2270668288 (2.1G) [application/octet-stream]
Saving to: ‘CENTOS_9_NODE_IMAGE_K8S_v1.29.0.qcow2.3’
^MCENTOS_9_NODE_IMAGE 0%[ ] 0 --.-KB/s ^MCENTOS_9_NODE_IMAGE 2%[ ] 54.73M 274MB/s ^MCENTOS_9_NODE_IMAGE 6%[> ] 130.85M 327MB/s ^MCENTOS_9_NODE_IMAGE 9%[> ] 204.76M 341MB/s ^MCENTOS_9_NODE_IMAGE 12%[=> ] 281.48M 352MB/s ^MCENTOS_9_NODE_IMAGE 16%[==> ] 359.76M 360MB/s ^MCENTOS_9_NODE_IMAGE 20%[===> ] 437.87M 365MB/s ^MCENTOS_9_NODE_IMAGE 23%[===> ] 517.19M 369MB/s ^MCENTOS_9_NODE_IMAGE 27%[====> ] 598.85M 374MB/s ^MCENTOS_9_NODE_IMAGE 31%[=====> ] 671.87M 373MB/s ^MCENTOS_9_NODE_IMAGE 33%[=====> ] 735.64M 368MB/s ^MCENTOS_9_NODE_IMAGE 36%[======> ] 799.74M 363MB/s ^MCENTOS_9_NODE_IMAGE 39%[======> ] 863.41M 360MB/s ^MCENTOS_9_NODE_IMAGE 42%[=======> ] 927.01M 356MB/s ^MCENTOS_9_NODE_IMAGE 45%[========> ] 991.03M 354MB/s ^MCENTOS_9_NODE_IMAGE 49%[========> ] 1.04G 354MB/s eta 3s ^MCENTOS_9_NODE_IMAGE 52%[=========> ] 1.11G 362MB/s eta 3s ^MCENTOS_9_NODE_IMAGE 56%[==========> ] 1.18G 361MB/s eta 3s ^MCENTOS_9_NODE_IMAGE 59%[==========> ] 1.26G 362MB/s eta 3s ^MCENTOS_9_NODE_IMAGE 63%[===========> ] 1.34G 363MB/s eta 3s ^MCENTOS_9_NODE_IMAGE 66%[============> ] 1.41G 363MB/s eta 2s ^MCENTOS_9_NODE_IMAGE 70%[=============> ] 1.49G 363MB/s eta 2s ^MCENTOS_9_NODE_IMAGE 73%[=============> ] 1.56G 361MB/s eta 2s ^MCENTOS_9_NODE_IMAGE 77%[==============> ] 1.63G 358MB/s eta 2s ^MCENTOS_9_NODE_IMAGE 80%[===============> ] 1.70G 355MB/s eta 2s ^MCENTOS_9_NODE_IMAGE 83%[===============> ] 1.77G 359MB/s eta 1s ^MCENTOS_9_NODE_IMAGE 87%[================> ] 1.85G 362MB/s eta 1s ^MCENTOS_9_NODE_IMAGE 90%[=================> ] 1.92G 366MB/s eta 1s ^MCENTOS_9_NODE_IMAGE 94%[=================> ] 2.00G 372MB/s eta 1s ^MCENTOS_9_NODE_IMAGE 97%[==================> ] 2.07G 374MB/s eta 1s ^MCENTOS_9_NODE_IMAGE 100%[===================>] 2.11G 377MB/s in 5.9s
That said, the current logging is really spammy, printing a literal thousand lines...
This did not work even though I can see in the Artifactory UI that the checksum is generated, might be related to repository settings could be investigated further.
Checking the file listing at https://artifactory.nordix.org/ui/native/metal3/images/k8s_v1.29.0/ does not show any checksum files to be downloaded. We should probably be uploading them with the images themselves.
It is only in a failure case that the corruption can happen and Indeed it does not generate so great in log files
Seeing the checksum in the filebrowser depends on this setting in artifactory: artifactory.ui.hideChecksums
But in the UI view I can at least see it: https://artifactory.nordix.org/ui/repos/tree/General/metal3/images/k8s_v1.29.0/UBUNTU_22.04_NODE_IMAGE_K8S_v1.29.0.qcow2
/triage accepted Imo the output of this should be :
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues will close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues will close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale /lifecycle frozen
When preparing the host for the metal3-dev-env the virtual machine base images are rather big, which leads to the current process being quite fragile to network or process interruptions. This might lead to errors in the configuration since the integrity checks in iamge_prepull.sh are quite loose.
The first problem raises from checking if the image exists
if [[ ! -f "${IMAGE_NAME}" ]]; then
This only cares if the image exists not about the content at allWhat does not help the situation is the check of checksum, since if the checksum does not exist it is generated from the file and file might be corrupt at this point already.
Finally I think the user experience could be elevated by adding a progress bar to this slow downloads so the user is not confused about what is happening with for example this options to wget: --show-progress --progress=bar:force:noscroll Note: This is quite new option for wget so might break on some older machines so might be best to consider a fallback in case wget does not recognize the options
An improvement suggestion would be to download the checksum directly from artifactory each time and comparing with the one of the actual file This is normally done by appending the type of checksum you want to the end of the filepath so something like this: https://artifactory.nordix.org/ui/native/metal3/images/k8s_v1.29.0/UBUNTU_22.04_NODE_IMAGE_K8S_v1.29.0.qcow2.sha256
This did not work even though I can see in the Artifactory UI that the checksum is generated, might be related to repository settings could be investigated further.
Otherwise this could most likely also be achieved with some wget options