scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
54 stars 93 forks source link

disrupt_network_block failed due to: tcset failure #8064

Closed bhalevy closed 1 month ago

bhalevy commented 1 month ago

Seen in https://argus.scylladb.com/test/701dfdf1-c052-4bcf-84a5-27833356d5fe/runs?additionalRuns[]=13dae2ef-9a9d-4606-8366-2aa794bb2c1c

Nemesis Information
Class: Sisyphus
Name: disrupt_network_block
Status: Failed
Failure reason
Traceback (most recent call last):
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 5213, in wrapper
    result = method(*args[1:], **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/nemesis.py", line 3473, in disrupt_network_block
    self.target_node.traffic_control(selected_option)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster_aws.py", line 840, in traffic_control
    tc_command = LOCAL_CMD_RUNNER.run("tcset eth1 {} --tc-command".format(tcconfig_params)).stdout
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/local_cmd_runner.py", line 87, in run
    result = _run()
  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 65, in inner
    return func(*args, **kwargs)
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/local_cmd_runner.py", line 77, in _run
    result = self.connection.local(**command_kwargs)
  File "/usr/local/lib/python3.10/site-packages/fabric/connection.py", line 750, in local
    return super(Connection, self).run(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/invoke/context.py", line 95, in run
    return self._run(runner, command, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/invoke/context.py", line 102, in _run
    return runner.run(command, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/invoke/runners.py", line 380, in run
    return self._run_body(command, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/invoke/runners.py", line 442, in _run_body
    return self.make_promise() if self._asynchronous else self._finish()
  File "/usr/local/lib/python3.10/site-packages/invoke/runners.py", line 509, in _finish
    raise UnexpectedExit(result)
invoke.exceptions.UnexpectedExit: Encountered a bad command exit code!

Command: 'tcset eth1 --loss 100% --tc-command'

Exit code: 1

Stdout:

Stderr:

    self._set_netem()
  File "/usr/local/lib/python3.10/site-packages/tcconfig/shaper/_interface.py", line 69, in _set_netem
    f"{self._get_netem_qdisc_major_id(self._tc_obj.qdisc_major_id):x}:"
  File "/usr/local/lib/python3.10/site-packages/tcconfig/shaper/htb.py", line 48, in _get_netem_qdisc_major_id
    self.__netem_major_id = self.__get_unique_netem_major_id()
  File "/usr/local/lib/python3.10/site-packages/tcconfig/shaper/htb.py", line 288, in __get_unique_netem_major_id
    exist_netem_major_ids = self.__extract_exist_netem_major_ids()
  File "/usr/local/lib/python3.10/site-packages/tcconfig/shaper/htb.py", line 272, in __extract_exist_netem_major_ids
    assert tcshow_out
AssertionError
bhalevy commented 1 month ago

@soyacz can you please look into this?

Cc @roydahan

soyacz commented 1 month ago

this error is due recent upgrade of tcconfig library in SCT runner (by dependabot when bumping requests library).

bhalevy commented 1 month ago

this error is due recent upgrade of tcconfig library in SCT runner (by dependabot when bumping requests library).

What's the planned fix?

soyacz commented 1 month ago

this error is due recent upgrade of tcconfig library in SCT runner (by dependabot when bumping requests library).

What's the planned fix?

I don't know yet: or we downgrade back this one or we'll try to find root cause why it's not working. @fruch any clues before I'll dive in?

fruch commented 1 month ago

this error is due recent upgrade of tcconfig library in SCT runner (by dependabot when bumping requests library).

What's the planned fix?

I don't know yet: or we downgrade back this one or we'll try to find root cause why it's not working. @fruch any clues before I'll dive in?

I would rather not revert, it's part of a whole cycle of upgrades

It should be very easy to debug, just run the exact same commands inside hydra (or even just locally)

I thought that I tested those nemesis when introducing it, but it might not be configured correctly, and I failed to notice it.

soyacz commented 1 month ago

issue created: https://github.com/thombashi/tcconfig/issues/187 I verified with 0.28.0 version (we use 0.28.1) and it worked. I propose to pin this to 0.28.0 - anyway we use it only to get the commands I suppose

soyacz commented 1 month ago

apparently downgrading this one is not trivial - docker package requires it...

soyacz commented 1 month ago

removing line https://github.com/thombashi/tcconfig/blob/f469c13730ff8b5e24186eba87dfe22ddd1975d6/tcconfig/shaper/htb.py#L272 fixes the issue, but still I don't know if this is the right fix and if we should go this way...

fruch commented 1 month ago

@soyacz it's a specific command that fails, i.e. other commands we use works ?

fruch commented 1 month ago

@soyacz it's a specific command that fails, i.e. other commands we use works ?

We can skip the specific command until it would be fixed

soyacz commented 1 month ago

@soyacz it's a specific command that fails, i.e. other commands we use works ?

I think it doesn't matter, always fails. My proposal is to fork the repo, remove bogus line and use this repo to install tcconfig until fixed in upstream the proper way.

fruch commented 1 month ago

@soyacz it's a specific command that fails, i.e. other commands we use works ?

I think it doesn't matter, always fails. My proposal is to fork the repo, remove bogus line and use this repo to install tcconfig until fixed in upstream the proper way.

I think the bug, it's assume the command is run, and looking for the output. but we don't execute the command just print them, to use them remotely

Anyhow fork is an option