scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
57 stars 95 forks source link

When encryption is enabled nemesis fails to start on Docker backend with `cp: cannot create regular file '/home/scylla-test/.cassandra/cqlshrc': Permission denied` error #7287

Open dimakr opened 8 months ago

dimakr commented 8 months ago

Issue description

If encryption is enabled in test configuration, any nemesis fails on the step of client certificate installation:

Traceback (most recent call last):
  File "/home/dmitriy/Work/Scylla/scylla-cluster-tests/sdcm/cluster.py", line 3717, in node_setup
    cl_inst.node_setup(_node, **setup_kwargs)
  File "/home/dmitriy/Work/Scylla/scylla-cluster-tests/sdcm/cluster_docker.py", line 311, in node_setup
    node.config_setup(append_scylla_args=self.get_scylla_args())
  File "/home/dmitriy/Work/Scylla/scylla-cluster-tests/sdcm/cluster.py", line 1591, in config_setup
    self.proposed_scylla_yaml
  File "/home/dmitriy/Work/Scylla/scylla-cluster-tests/sdcm/cluster.py", line 430, in proposed_scylla_yaml
    scylla_yml = ScyllaYaml(**self._proposed_scylla_yaml_properties)
  File "/home/dmitriy/Work/Scylla/scylla-cluster-tests/sdcm/cluster.py", line 422, in _proposed_scylla_yaml_properties
    return node_params_builder.dict(exclude_none=True) | certificate_params_builder.dict(exclude_none=True)
  File "/home/dmitriy/Work/Scylla/scylla-cluster-tests/sdcm/provision/common/builders.py", line 69, in dict
    prop_value = getattr(self, prop)
  File "/home/dmitriy/Work/Scylla/scylla-cluster-tests/sdcm/provision/scylla_yaml/certificate_builder.py", line 43, in client_encryption_options
    certificate=os.path.join(self._ssl_files_path, 'client', os.path.basename(CLIENT_CERTFILE)),
  File "/home/dmitriy/.pyenv/versions/3.10.0/lib/python3.10/functools.py", line 970, in __get__
    val = self.func(instance)
  File "/home/dmitriy/Work/Scylla/scylla-cluster-tests/sdcm/provision/scylla_yaml/certificate_builder.py", line 34, in _ssl_files_path
    install_client_certificate(self.node.remoter)
  File "/home/dmitriy/Work/Scylla/scylla-cluster-tests/sdcm/provision/helpers/certificate.py", line 35, in install_client_certificate
    remoter.run('bash -cxe "%s"' % setup_script)
  File "/home/dmitriy/Work/Scylla/scylla-cluster-tests/sdcm/remote/remote_base.py", line 614, in run
    result = _run()
  File "/home/dmitriy/Work/Scylla/scylla-cluster-tests/sdcm/utils/decorators.py", line 70, in inner
    return func(*args, **kwargs)
  File "/home/dmitriy/Work/Scylla/scylla-cluster-tests/sdcm/remote/remote_base.py", line 605, in _run
    return self._run_execute(cmd, timeout, ignore_status, verbose, new_session, watchers)
  File "/home/dmitriy/Work/Scylla/scylla-cluster-tests/sdcm/remote/remote_base.py", line 538, in _run_execute
    result = connection.run(**command_kwargs)
  File "/home/dmitriy/Work/Scylla/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 620, in run
    return self._complete_run(channel, exception, timeout_reached, timeout, result, warn, stdout, stderr)
  File "/home/dmitriy/Work/Scylla/scylla-cluster-tests/sdcm/remote/libssh2_client/__init__.py", line 655, in _complete_run
    raise UnexpectedExit(result)
sdcm.remote.libssh2_client.exceptions.UnexpectedExit: Encountered a bad command exit code!

Command: 'bash -cxe "\nmkdir -p ~/.cassandra/\ncp /tmp/ssl_conf/client/cqlshrc ~/.cassandra/\nsudo mkdir -p /etc/scylla/\nsudo rm -rf /etc/scylla/ssl_conf/\nsudo mv -f /tmp/ssl_conf/ /etc/scylla/\n"'

Exit code: 1

Stdout:

Stderr:

+ '[' -z '' ']'
+ return
+ case $- in
+ return
+ mkdir -p /home/scylla-test/.cassandra/
+ cp /tmp/ssl_conf/client/cqlshrc /home/scylla-test/.cassandra/
cp: cannot create regular file '/home/scylla-test/.cassandra/cqlshrc': Permission denied

Steps to Reproduce

  1. Enable encryption in a test config:
    server_encrypt: true
    client_encrypt: true
  2. Run the test with any Nemesis class, e.g. StopStartMonkey

Expected behavior: client certificate is successfully installed during DB node init and the scenario is started.

Actual behavior: client certificate installation is failed with cp: cannot create regular file '/home/scylla-test/.cassandra/cqlshrc': Permission denied error.

Impact

The issue prevents from starting a nemesis for scenarios with encryption enabled. Workaround for the issue is to execute test scenarios with encryption disabled.

How frequently does it reproduce?

Always.

Installation details

SCT Version: master Scylla version: 5.4.4 Environment: local tests execution on docker backend.

Logs

SCT log: sct.log

fruch commented 8 months ago

I think this is an issue specific to docker backend, since this file exist and was created by root already see https://github.com/scylladb/scylla-cluster-tests/commit/565e996bd37f2dc288db4c71c0cef046a38a2664

fruch commented 8 months ago

I think this copy command, can be done with sudo

dimakr commented 8 months ago

I think this is an issue specific to docker backend, since this file exist and was created by root already see 565e996

I think this copy command, can be done with sudo

I tried to troubleshoot it further and see that there are 2 options to get rid of permission denied problem:

Both options help. BUT, it turned out that something changes /home/$USER/.cassandra/cqlshrc again, after certificate installation is finished. The behavior after applying fix for certificate installation is:

  1. /home/$USER/.cassandra/cqlshrc file content before certificate installation is
    [connection]
    hostname = 172.17.0.2
  2. content of the file right after certificate installation
    
    [connection]
    factory = cqlshlib.ssl.ssl_transport_factory

[ssl] certfile = /etc/scylla/ssl_conf/client/test.crt validate = false userkey = /etc/scylla/ssl_conf/client/test.key usercert = /etc/scylla/ssl_conf/client/test.crt

3. some time later after the test progresses the content of the file is reverted back to

[connection] hostname = 172.17.0.2

As a result of this described behavior the test is failing with error:

Validation is enabled; SSL transport factory requires a valid certfile to be specified. Please provide path to the certfile in [ssl] section as 'certfile' option in /home/scylla-test/.cassandra/cqlshrc (or use [certfiles] section) or set SSL_CERTFILE environment variable.



I will troubleshoot this more to get to root cause of what is reverting the `/home/$USER/.cassandra/cqlshrc` file to initial state.
But should we do it in another issue, and in the current one take care only of `permission denied` problem? @fruch 
fruch commented 8 months ago

this code logic, of overwrite this file, was create for VM, and assume is didn't existed before

and in this case, root have created this file (as part of the Dockerfile), so we need root permission to handle the file, and we need to make sure it's merged, not override, since would break the cqlsh.

and it's need to be persistent somehow, so restarts won't make the SSL configuration go away, and also the key files themselves.

so the solution for this, is a bit more complicated than just permissions

I think that it would need something like creating the file and key, and mounting them into place (into the user directory, and maybe into /root/ as well)

other direction would be todo this configuration before each run_cqlsh call, and not part of setup. but that might be a bit excessive