Closed fgelcer closed 10 months ago
This type of failure happens quite a look with centos repos, especially centos7 we I'm not sure for how long it's gonna be maintained.
I think we can add retries in all install commands we have (lot of them do have)
This type of failure happens quite a look with centos repos, especially centos7 we I'm not sure for how long it's gonna be maintained.
I think we can add retries in all install commands we have (lot of them do have)
usually we ignore such failures and just re-run, but @mykaul asked for this issue, so i will try this or next sprint to go over the installation commands, and try to add some retries
Seems like we are not installing EPEL on centos7 (and oel76);before installing Scylla
And we might get old reference to EPEL that doesn't exist
We should try installing EPEL on centos7 and see if it improve the situation.
In general we should try checking if EPEL is really needed, and maybe disable it by default And enable it only for the packages needs it
Here is one example:
On offline artifacts tests it's happening once in a few days
Cluster size: 1 nodes (i3.large)
Scylla Nodes used in this run:
OS / Image: ami-0d6a24fe35fdf5dc4
(aws: undefined_region)
Test: artifacts-oel76-test
Test id: d5cbbc5a-d737-4803-8caa-87a999a6e5d3
Test name: scylla-master/artifacts-offline-install/artifacts-oel76-test
Test config file(s):
Hopefully these issues will not be relevant once centOS7 will be EOL I don't want to start tailoring a dedicated solution, if we can workaround it with 10 retries (or even more) I would try that first.
once we have a solution - try it for a whole day (i.e. 20-30 runs)
Running here: https://jenkins.scylladb.com/job/scylla-staging/job/yulia/job/artifacts-oel76-test/
can you explain what's there ? or point to PR/code ?
I added retrying in two places, just see if the issue will reproduced again. Will open a draft PR
Running here: https://jenkins.scylladb.com/job/scylla-staging/job/yulia/job/artifacts-oel76-test/
can you explain what's there ? or point to PR/code ?
https://github.com/scylladb/scylla-cluster-tests/pull/6800/files
https://github.com/scylladb/scylla-cluster-tests/pull/6809 should be fixing this one
Prerequisites
Versions
c2b1b0f7b671a02b917781d1cd7fdd326def1f8a (origin/branch-perf-v14)
5.4.0~dev.20230618.b7627085cb13 with build-id a2d9adc050ce01f3543f876ea72d863b1ca6e615
Logs
7adc09e3-a131-4165-b65f-75fca5ffb050
Description
During node setup, we run few dependencies installation, and one of them failed:
Steps to Reproduce
not sure why it happened, as it could be an external problem (with the repo we tried to access), or to local network issue, or something else, but no idea how to reproduce.
Expected behavior: test should retry few times, when getting 404 error (or even other different errors), instead of just failing (if possible to do it, in such early stage)