Closed leandron closed 2 years ago
cc @hogepodge @tqchen
@leandron good findings, seems should be an improvement of prune_and_sync, can you send a PR?
We should also add timeout retry (with some backoff) to try to reupload the wheel if first attempt failed in the upload script
@leandron good findings, seems should be an improvement of prune_and_sync, can you send a PR?
I will send a PR with this change.
@leandron gentle ping on sending the PR to add retry
There was a problem with last night Linux package sync. Trying to understand what happened, so that we can make the scripts more robust.
It seems the workflow "Wheel-Manylinux-Nightly" failed, due to a GitHub upload timeout in https://github.com/tlc-pack/tlcpack/runs/2506719665?check_suite_focus=true#step:6:94, some packages, related to "Build (tlcpack-nightly, none, tlcpack/package-cpu:v0.3) " were already uploaded using a newer tag.
So when "Prune-Nightly" kicked in, it deleted the previously generated packages, and replaced with the new version and committed it to the website:
https://github.com/tlc-pack/tlc-pack.github.io/commit/808752e25f1e9a531d64510475c592dbaf4f6ce3#diff-2c60e44ee4bb531bb2a6175f4719186539796232913a75def24c302f310fa569
At this point we have some nightly which are older than others.
The main issue is that in parallel to all this, something happened at the point the upload of the python 3.6 cpu package tlcpack_nightly-0.8.dev959%2Bg26a5e299b-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl was being done, making it get corrupted somewhere. When you click on the "tlcpack_nightly-0.8.dev959%2Bg26a5e299b-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl" link on https://tlcpack.ai/wheels, it shows an error message, instead of the usual GitHub 404 page:
One thing we could do to avoid that specific issue in future, is to run a quick health check for the URLs we are updating https://tlcpack.ai/wheels with. This would be in the context of
wheel/wheel_prune_and_sync.py
: https://github.com/tlc-pack/tlcpack/blob/f5d0a703d56c13216894a3d4e2d9adb071e60e09/wheel/wheel_prune_and_sync.py#L76Using last night's case, it would be something as simple as a
requests.get()
call: