scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
58 stars 95 forks source link

Artifact tests: Stop using a machine for waiting #3357

Closed hagitsegev closed 1 year ago

hagitsegev commented 3 years ago

Logs

See any log of SCT artifact test such as https://jenkins.scylladb.com/view/nexts/job/scylla-4.4/job/artifacts/job/artifacts-centos8-test/45/console

Description

When an artifact test is running, it starts on gce2-qavpc It performs checkout, Sets a timeout of 1 hour, and than: Starts on gce2-qavpc. Now it sets another timeout for 30 minutes Performs another checkout Starts running tests in parallel on "Build Server..."

The meaning, is that gce2-qavpc machine is just busy waiting for another gce2-qavpc machine, that is waiting for some "build servers". I guess that the gce2-qavpc machines are limited, so you should not use 2 machines just to wait...

I suggest you to set the label of the job to be "master". This machine (Jenkins server) has quite a few "runners" and does not cost anything. On pkg we use this trick on many pipelines for waiting. You can see for example: https://github.com/scylladb/scylla-pkg/blob/master/scripts/jenkins-pipelines/centos-rpm.jenkinsfile On line 37 we start the job: label "master" On line 101 we start on another machine (we have less machines of this type, and we don't want to waste them): node(params.BUILD_NODE_PARAM) On line 192 we free this machine, and go back to run on "master". Here we just wait for the tests to run. On line 232 we are back to BUILD_NODE_PARAM machine to do something else.

So - while we are waiting for the tests, we can use BUILD_NODE_PARAM for other tasks...

Steps to Reproduce

  1. Run any SCT test
  2. See logs for "Running on" to track machine usage.

Expected behavior: [What you expected to happen] Use a cheap machine for waiting, while a strong machine just to do the work.

Actual behavior: [What actually happened] A limited machine is waiting for another limited machine that is waiting for other machines to do the work.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 2 years with no activity. Remove stale label or comment or this will be closed in 2 days.

fgelcer commented 1 year ago

we are now using SCT Runners