scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
53 stars 92 forks source link

artifact-tests: Don't fail test because of scylla-housekeeping #7970

Closed yaronkaikov closed 2 weeks ago

yaronkaikov commented 1 month ago

Recently due to multiple changes in Scylla our artifact tests keep failing due to Scylla-housekeeping version failure, based on @mykaul request, we shouldn't fail our artifacts for this.

fruch commented 1 month ago

i.e. we shouldn't test house-keeping anymore ?

roydahan commented 1 month ago

@mykaul please approve that we stop testing scylla housekeeping.

mykaul commented 1 month ago

I personally do not see the value in failing the whole pipeline over it. But what was not clear to me was what was failing. I was under the impression it was more than housekeeping itself that failed. Specifically, from https://github.com/scylladb/scylladb/issues/19564 :

Command: 'sudo /usr/lib/scylla/scylla_setup --nic ens5 --disks /dev/nvme0n1 --setup-nic-and-disks  --swap-directory /  --no-verify-package '
Exit code: 1

So scylla_setup failed over the issue with the housekeeping. And this makes little sense to me.

yaronkaikov commented 1 month ago

@mykaul @roydahan please decide what we should do with that. removing or adding || true so it will not fail is the same, we will not test it if we are ok with this, great. if not let's close this issue

fruch commented 1 month ago

Let's ask differently, does anyone is using the data from the house keeping database ?

mykaul commented 1 month ago

@tzach , @amnonh ? I feel with central monitoring that we have in Scylla cloud, there's a diminishing value with housekeeping. If we really wish to, we should aim higher (example - https://telemetry-public.ceph.com/ )

roydahan commented 1 month ago

Why are we focusing on the side track of things? Did we check what is the actual issue and if we can fix it?

mykaul commented 1 month ago

Why are we focusing on the side track of things? Did we check what is the actual issue and if we can fix it?

I'm surprised/annoyed we did not fix it yet. Should have taken an hour or so, takes 2 weeks.

fruch commented 1 month ago

Why are we focusing on the side track of things? Did we check what is the actual issue and if we can fix it?

nothing is currently broken, as written in the description of this one, @mykaul was asking to remove checks/test.

I'm surprised/annoyed we did not fix it yet. Should have taken an hour or so, takes 2 weeks.

nothing in scylla get fixed in 1h

roydahan commented 1 month ago

Why are we focusing on the side track of things? Did we check what is the actual issue and if we can fix it?

I'm surprised/annoyed we did not fix it yet. Should have taken an hour or so, takes 2 weeks.

You shouldn't, this issue doesn't refer to anything specific so we couldn't fix anything that you may refer to. If you refer to the docker artifact test, it looks like a flaky issue and I asked @Annamikhlin to open an issue for that.

Let's stop opening issues as the above, instead of assuming and recommending, let's focus on the issue itself and let the people who fix it come up with the solution/recommendation.

mykaul commented 4 weeks ago

I don't think we should fail tests. I think the code in the setup should ignore housekeeping script errors.

roydahan commented 4 weeks ago

This is a code that specifically tests for housekeeping correctness.

roydahan commented 2 weeks ago

Bottom line IIUC: The setup doesn't fail, we do want the test to fail if it doesn't work.