scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
56 stars 94 forks source link

hydra does not work on fedora 31 #1897

Closed asias closed 2 years ago

asias commented 4 years ago

I hacked hydra.sh a bit to make it run on podman on fedora 31

--- a/docker/env/hydra.sh
+++ b/docker/env/hydra.sh
@@ -76,13 +76,17 @@ for gid in $(id -G); do
     group_args+=(--group-add "$gid")
 done

+tmpdir=$(mktemp -d)
+vartmpdir=$(mktemp -d)
+rundir=$(mktemp -d)
+
 docker run --rm ${TTY_STDIN} --privileged \
     -h ${HOST_NAME} \
-    -v /var/run:/run \
+    -v "$rundir:/run:z" \
     -v ${SCT_DIR}:${WORK_DIR} \
     -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
-    -v /tmp:/tmp \
-    -v /var/tmp:/var/tmp \
+    -v "$tmpdir:/tmp:z" \
+    -v "$vartmpdir:/var/tmp:z" \
     -v ${HOME}:${HOME} \
     -v /etc/passwd:/etc/passwd:ro \
     -v /etc/group:/etc/group:ro \

But it still does not work.

$ hydra
docker version 1.8.0
Cleaning unused Docker resources...
Deleted Pods
Deleted Containers
Deleted Volumes
Image up-to-date
Obtaining QA SSH keys...
QA private key '/home/asias/.ssh/scylla-qa-ec2' exists. Nothing to update.
QA private key '/home/asias/.ssh/scylla-test' exists. Nothing to update.
QA private key '/home/asias/.ssh/support' exists. Nothing to update.
QA SSH keys obtained.
Making sure the ownerships of results directories are of the user
Traceback (most recent call last):
  File "./sct.py", line 12, in <module>
    from sdcm.results_analyze import PerformanceResultsAnalyzer
  File "/sct/sdcm/results_analyze.py", line 10, in <module>
    from sdcm.db_stats import TestStatsMixin
  File "/sct/sdcm/db_stats.py", line 17, in <module>
    from sdcm.utils.common import get_job_name, retrying, remove_comments, normalize_ipv6_url
  File "/sct/sdcm/utils/common.py", line 417, in <module>
    AWS_REGIONS = all_aws_regions()
  File "/sct/sdcm/utils/common.py", line 414, in all_aws_regions
    return [region['RegionName'] for region in client.describe_regions()['Regions']]
  File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 648, in _make_api_call
    operation_model, request_dict, request_context)
  File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 667, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/usr/local/lib/python3.6/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/local/lib/python3.6/site-packages/botocore/endpoint.py", line 132, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/usr/local/lib/python3.6/site-packages/botocore/endpoint.py", line 116, in create_request
    operation_name=operation_model.name)
  File "/usr/local/lib/python3.6/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/usr/local/lib/python3.6/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/usr/local/lib/python3.6/site-packages/botocore/signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "/usr/local/lib/python3.6/site-packages/botocore/signers.py", line 157, in sign
    auth.add_auth(request)
  File "/usr/local/lib/python3.6/site-packages/botocore/auth.py", line 357, in add_auth
    raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials

It looks like we have issue access host .ssh directory.

[asias@hjpc2 scylla-cluster-tests]$ hydra bash
docker version 1.8.0
Cleaning unused Docker resources...
Deleted Pods
Deleted Containers
Deleted Volumes
Image up-to-date
Obtaining QA SSH keys...
QA private key '/home/asias/.ssh/scylla-qa-ec2' exists. Nothing to update.
QA private key '/home/asias/.ssh/scylla-test' exists. Nothing to update.
QA private key '/home/asias/.ssh/support' exists. Nothing to update.
QA SSH keys obtained.
Making sure the ownerships of results directories are of the user
running  bash
bash: /home/asias/.bashrc: Permission denied
bash-4.2$ ls /home/asias/.ssh
ls: cannot access /home/asias/.ssh: Permission denied

After making a temp home with .ssh keys. I still get the error.

[asias@hjpc2 scylla-cluster-tests]$ find myhome
myhome
myhome/.ssh
myhome/.ssh/scylla-qa-ec2
myhome/.ssh/scylla-test
myhome/.ssh/support

--- a/docker/env/hydra.sh
+++ b/docker/env/hydra.sh
@@ -76,14 +76,19 @@ for gid in $(id -G); do
     group_args+=(--group-add "$gid")
 done

+tmpdir=$(mktemp -d)
+vartmpdir=$(mktemp -d)
+rundir=$(mktemp -d)
+myhomedir="`pwd`/myhome"
+
 docker run --rm ${TTY_STDIN} --privileged \
     -h ${HOST_NAME} \
-    -v /var/run:/run \
+    -v "$rundir:/run:z" \
     -v ${SCT_DIR}:${WORK_DIR} \
     -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
-    -v /tmp:/tmp \
-    -v /var/tmp:/var/tmp \
-    -v ${HOME}:${HOME} \
+    -v "$tmpdir:/tmp:z" \
+    -v "$vartmpdir:/var/tmp:z" \
+    -v "$myhomedir:${HOME}:z" \
     -v /etc/passwd:/etc/passwd:ro \
     -v /etc/group:/etc/group:ro \
     -v /etc/sudoers:/etc/sudoers:ro \
$  /usr/local/bin/hydra 
docker version 1.8.0
Cleaning unused Docker resources...
Deleted Pods
Deleted Containers
Deleted Volumes
Image up-to-date
Obtaining QA SSH keys...
QA private key '/home/asias/.ssh/scylla-qa-ec2' exists. Nothing to update.
QA private key '/home/asias/.ssh/scylla-test' exists. Nothing to update.
QA private key '/home/asias/.ssh/support' exists. Nothing to update.
QA SSH keys obtained.
Making sure the ownerships of results directories are of the user
Traceback (most recent call last):
  File "./sct.py", line 12, in <module>
    from sdcm.results_analyze import PerformanceResultsAnalyzer
  File "/sct/sdcm/results_analyze.py", line 10, in <module>
    from sdcm.db_stats import TestStatsMixin
  File "/sct/sdcm/db_stats.py", line 17, in <module>
    from sdcm.utils.common import get_job_name, retrying, remove_comments, normalize_ipv6_url
  File "/sct/sdcm/utils/common.py", line 417, in <module>
    AWS_REGIONS = all_aws_regions()
  File "/sct/sdcm/utils/common.py", line 414, in all_aws_regions
    return [region['RegionName'] for region in client.describe_regions()['Regions']]
  File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 648, in _make_api_call
    operation_model, request_dict, request_context)
  File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 667, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/usr/local/lib/python3.6/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/local/lib/python3.6/site-packages/botocore/endpoint.py", line 132, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/usr/local/lib/python3.6/site-packages/botocore/endpoint.py", line 116, in create_request
    operation_name=operation_model.name)
  File "/usr/local/lib/python3.6/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/usr/local/lib/python3.6/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/usr/local/lib/python3.6/site-packages/botocore/signers.py", line 90, in handler
    return self.sign(operation_name, request)
  File "/usr/local/lib/python3.6/site-packages/botocore/signers.py", line 157, in sign
    auth.add_auth(request)
  File "/usr/local/lib/python3.6/site-packages/botocore/auth.py", line 357, in add_auth
    raise NoCredentialsError
botocore.exceptions.NoCredentialsError: Unable to locate credentials
fruch commented 4 years ago

The last error you are getting is missing AWS credentials...

You should copy AWS credentials into the temp home for you created.

Fedora 31 doesn't define $HOME variable anymore ?

/var/run mount is needed for running docker We need the docker .sock file to run docker from within hydra.

is docker installed ? or you are running podman ?

asias commented 4 years ago

The last error you are getting is missing AWS credentials...

You should copy AWS credentials into the temp home for you created.

Is it .aws/credentials?

Fedora 31 doesn't define $HOME variable anymore ?

It does. But with podman, the container can not access it, having permission issues.

/var/run mount is needed for running docker We need the docker .sock file to run docker from within hydra.

I see.

is docker installed ? or you are running podman ?

Yes, podman. F31 only has podman. I am wondering if you guys run hydra on Fedora. We had some patch in scylla/dbuild to make it work with podman. I think we need some changes for hydra too.

asias commented 4 years ago

It is the same after I copy .aws to the temp home dir.

bash-4.2$ find ~/
/home/asias/
/home/asias/.ssh
/home/asias/.ssh/scylla-qa-ec2
/home/asias/.ssh/scylla-test
/home/asias/.ssh/support
/home/asias/.aws
/home/asias/.aws/credentials
/home/asias/.aws/config
fruch commented 4 years ago

The last error you are getting is missing AWS credentials... You should copy AWS credentials into the temp home for you created.

Is it .aws/credentials?

Fedora 31 doesn't define $HOME variable anymore ?

It does. But with podman, the container can not access it, having permission issues.

/var/run mount is needed for running docker We need the docker .sock file to run docker from within hydra.

I see.

is docker installed ? or you are running podman ?

Yes, podman. F31 only has podman. I am wondering if you guys run hydra on Fedora. We had some patch in scylla/dbuild to make it work with podman. I think we need some changes for hydra too.

Most of us are running with Ubuntu (some with older Fedora), and out CI works on centos. So I guess some work would be needed to get podman working correctly on Fedora.

meanwhile, you might just want to create a python virtualenv, and skip hydra...

nyh commented 3 years ago

I hacked hydra.sh a bit to make it run on podman on fedora 31

If I understand correctly, hydra still doesn't work with podman, so this issue is still important. I realize that there are workarounds (such as installing docker and changing the kernel configuration so it will work) but it will be a better experience for new SCT users if they can just use it without changing kernel parameters.

It looks like we have issue access host .ssh directory.

While this can probably be fixed, I just want to point out that any attempt to access the host .ssh directory is already a mistake. For example in issue #741 I noted that hydra runs ssh which mistakenly uses the user's $HOME/.ssh/config and can break because of unexpected settings (and actually broke for me), and I sent a patch adding the "-F /dev/null" option to every ssh command line - which causes it to take /dev/null instead of the default ~/.ssh/config. If there is anything else taken from ~/.ssh, it should be changed on the command line as well.

File "/usr/local/lib/python3.6/site-packages/botocore/auth.py", line 357, in add_auth raise NoCredentialsError botocore.exceptions.NoCredentialsError: Unable to locate credentials

The AWS credentials are usually taken $HOME/.aws/credentials. Again, using the user's actual home directory and whichever credentials are stored there in SCT Alternator (?) tests is a mistake - beyond being something which podman has a difficult time with. SCT may need to override the "HOME" variable when running these tests so it will pick up the credential files from some other place.

fruch commented 3 years ago

File "/usr/local/lib/python3.6/site-packages/botocore/auth.py", line 357, in add_auth raise NoCredentialsError botocore.exceptions.NoCredentialsError: Unable to locate credentials

The AWS credentials are usually taken $HOME/.aws/credentials. Again, using the user's actual home directory and whichever credentials are stored there in SCT Alternator (?) tests is a mistake - beyond being something which podman has a difficult time with. SCT may need to override the "HOME" variable when running these tests so it will pick up the credential files from some other place.

Where exactly you want credentials to come from ?

nyh commented 3 years ago

Where exactly you want credentials to come from ?

Sorry, you're right. I misunderstood the purpose of these credentials - I thought they were fake (e.g., for Alternator tests), I understand now this is really supposed to be talking to the real AWS and use your real credentials.

fruch commented 3 years ago

Where exactly you want credentials to come from ?

Sorry, you're right. I misunderstood the purpose of these credentials - I thought they were fake (e.g., for Alternator tests), I understand now this is really supposed to be talking to the real AWS and use your real credentials.

yeah boto3 is used for other things :)

asias commented 3 years ago

@fruch @roydahan @bentsi Please consider make hydra work with podman. I checked the latest master it does not support podman.

From time to time, you guys ask developers to look at the logs or monitors using the hydra command. How do we look if it even does not work on our development machine?

[asias@hjpc2 scylla-cluster-tests]$ sudo ./install-hydra.sh
[sudo] password for asias: 
docker version 2.1.1
Docker is installed.
Hydra installed.
aws-cli/1.17.17 Python/3.7.7 Linux/5.8.17-100.fc31.x86_64 botocore/1.14.17
AWS CLI installed.
==================================     NOTES      =================================================
To check that Hydra is installed, run 'hydra ls' anywhere in bash.
It will run 'ls' command in the SCT Docker container.
When running Hydra for the first time it will build the SCT Docker image. Please be patient!
===================================================================================================
Please run 'aws configure' to configure AWS CLI
[asias@hjpc2 scylla-cluster-tests]$ hydra ls
docker version 2.1.1
Pull version v0.83-update from Docker Hub...
Trying to pull registry.fedoraproject.org/scylladb/hydra:v0.83-update...
  manifest unknown: manifest unknown
Trying to pull registry.access.redhat.com/scylladb/hydra:v0.83-update...
  name unknown: Repo not found
Trying to pull registry.centos.org/scylladb/hydra:v0.83-update...
  manifest unknown: manifest unknown
Trying to pull docker.io/scylladb/hydra:v0.83-update...
Getting image source signatures
Copying blob 776716b0aa87 done  
Copying blob d121f8d1c412 done  
Copying blob bdb458dd27ea done  
Copying blob c37461631ba3 done  
Copying blob ca572574cc82 done  
Copying blob fc4735adbfb5 done  
Copying blob 657a735182f1 done  
Copying blob d4651f2b6a50 done  
Copying blob 3564dc46f836 done  
Copying config ead8b1a8c2 done  
Writing manifest to image destination
Storing signatures
ead8b1a8c2d1b95d6dff429530199671c4b1221abb7dae416645c5bada7d8d44
Obtaining QA SSH keys...
QA private key '/home/asias/.ssh/scylla-qa-ec2' exists. Nothing to update.
QA private key '/home/asias/.ssh/scylla-test' exists. Nothing to update.
QA private key '/home/asias/.ssh/support' exists. Nothing to update.
QA SSH keys obtained.
Making sure the ownerships of results directories are of the user
Going to run './sct.py ls'...
Error: open `/home/asias/.local/share/containers/storage/overlay/d253e4a27b84d092ded00f079dcad64837552fa5718543c4a3d93984f6a08958/merged/.containerenv`: No such file or directory: OCI runtime command not found error
asias commented 3 years ago
$ hydra ls
docker version 2.1.1
There is scylladb/hydra:v0.83-update in local cache, use it.
Obtaining QA SSH keys...
QA private key '/home/asias/.ssh/scylla-qa-ec2' exists. Nothing to update.
QA private key '/home/asias/.ssh/scylla-test' exists. Nothing to update.
QA private key '/home/asias/.ssh/support' exists. Nothing to update.
QA SSH keys obtained.
Making sure the ownerships of results directories are of the user
Going to run './sct.py ls'...
Error: mkdir `secrets`: Permission denied: OCI runtime permission denied error
fruch commented 3 years ago

@asias we don't yet support podman,

for it to work, docker (a.k.a Moby), need to be installed https://fedoramagazine.org/docker-and-fedora-32/

asias commented 3 years ago

@asias we don't yet support podman,

for it to work, docker (a.k.a Moby), need to be installed https://fedoramagazine.org/docker-and-fedora-32/

Any technical issues prevents hydra to work with podman? I doubt it is a lot of work to make it work with podman.

I do not want to install docker just for hydra. Our build system dbuild and scylla-monitor all work with podman.

fruch commented 3 years ago

@asias we don't yet support podman, for it to work, docker (a.k.a Moby), need to be installed https://fedoramagazine.org/docker-and-fedora-32/

Any technical issues prevents hydra to work with podman? I doubt it is a lot of work to make it work with podman.

Yes there are, we are using docker-in-docker, and rely on the docker Unix domain socket for using it from the code

There is a new experimental feature in podman that can supply an API compatible with docker that might help.

I tried that, and failed to get it working, I know @eliransin also tried.

If you think you know how to solve those two things, we'll be more then happy to review a PR for it.

I do not want to install docker just for hydra. Our build system dbuild and scylla-monitor all work with podman.

Right now that's the only option we know is working.

asias commented 3 years ago

@asias we don't yet support podman, for it to work, docker (a.k.a Moby), need to be installed https://fedoramagazine.org/docker-and-fedora-32/

Any technical issues prevents hydra to work with podman? I doubt it is a lot of work to make it work with podman.

Yes there are, we are using docker-in-docker, and rely on the docker Unix domain socket for using it from the code

Are you using nested docker (host -> container 1 -> container 2)? Or just starting another docker instance on host from the container 1?

There is a new experimental feature in podman that can supply an API compatible with docker that might help.

I tried that, and failed to get it working, I know @eliransin also tried.

If you think you know how to solve those two things, we'll be more then happy to review a PR for it.

I do not want to install docker just for hydra. Our build system dbuild and scylla-monitor all work with podman.

Right now that's the only option we know is working.

eliransin commented 3 years ago

I tried that, and failed to get it working, I know @eliransin also tried.

I did try it, it doesn't currently work because of some inconsistencies in the API, those are bugs that are likely to be fixed in one of the versions. I think I can solve most or all of them, however, I don't have time to deal with this. @fruch those problems are solvable and I expressed my opinion about the importance of supporting podman but I guess it is in a different priority on your queue :slightly_smiling_face: If and when I have time I will look into this.

fruch commented 3 years ago

I tried that, and failed to get it working, I know @eliransin also tried.

I did try it, it doesn't currently work because of some inconsistencies in the API, those are bugs that are likely to be fixed in one of the versions. I think I can solve most or all of them, however, I don't have time to deal with this. @fruch those problems are solvable and I expressed my opinion about the importance of supporting podman but I guess it is in a different priority on your queue If and when I have time I will look into this.

I know everything is solvable, it's just code, isn't it ? I've played with it a bit, and got the the unix domain socket to work locally on my machine, but failed in mounting it into the podman instance. I can put a PR in place with what I have so far, but it's not yet complete solution. (i.e. I've got hydra running on podman, but it couldn't do things like hydra investigate show-monitor, since I could found a way to get socket mounted into the instance, kept failing on permissions issues (sounds like a podman bug to me..., but couldn't get it to work)

eliransin commented 3 years ago

@fruch

I know everything is solvable, it's just code, isn't it ?

I meant to say easily solvable with a 2-3 man days...

Actually there is a solution for that, you can use an ip address instead of unix socket, although I had no problem at all mounting the unix socket - you just need to create a rootless one. The reason I used an IP was so I can look at the rest traffic using wireshark and compare behavior between docker and podman. There are other problems that needs to be worked around - one example is the exposure of random ports on the host, this functionality doesn't work for podman + docker rest API, I verified that it doesn't work in the podman code. The solution will need to be a workaround, a push to podman upstream or a combination of both.

nyh commented 3 years ago

Even if you do rely on docker, I don't see how "docker on docker" is an excuse for anything. If I understand correctly, you need "docker on docker" to run tests locally, but most users will never want to do that - they want to use SCT as a tool for starting tests on an AWS cluster. For that you do not need docker on docker. Or any fancy docker feature that doesn't exist in Python.

Please don't hold your breath until podman is 100% compatible with docker, or until every one of the 30 developers that uses Fedora (because that's what Avi decided) caves in an replaces the default podman by moby+kernel-parameters.

Finally, I know I'm repeating myself, but I think this is worth repeating. Hydra does not really need to rely on docker. This was your choice to easier installation - so that users do not need to install a bunch of tools and libraries that Hydra uses. However, it could and should have remained a choice. An option. Your documentation even claims it is possible to run sct.py not in docker, and @roydahan recommended this option to me - and yet I was warned it probably doesn't work. It should work. How difficult would it be to make it work?

NOTE: My rant here is not specifically about QA. The insistence on extremely-specific development enviroment is a general problem in our company, that I think is hurting developers including would-be open-source contributors that don't exist partially because of this reason. I'm just as frustrated with Avi's acceptance of the fact that Scylla can only build on extremely specific Linux distributions (different from QA's choice), and I try to air this frustration as well. Evidently, papering over this problem with docker has its limits too. There has to be a better way.

fruch commented 3 years ago

Even if you do rely on docker, I don't see how "docker on docker" is an excuse for anything. If I understand correctly, you need "docker on docker" to run tests locally, but most users will never want to do that - they want to use SCT as a tool for starting tests on an AWS cluster. For that you do not need docker on docker. Or any fancy docker feature that doesn't exist in Python.

Please don't hold your breath until podman is 100% compatible with docker, or until every one of the 30 developers that uses Fedora (because that's what Avi decided) caves in an replaces the default podman by moby+kernel-parameters.

Finally, I know I'm repeating myself, but I think this is worth repeating. Hydra does not really need to rely on docker. This was your choice to easier installation - so that users do not need to install a bunch of tools and libraries that Hydra uses. However, it could and should have remained a choice. An option. Your documentation even claims it is possible to run sct.py not in docker, and @roydahan recommended this option to me - and yet I was warned it probably doesn't work. It should work. How difficult would it be to make it work?

it might work for very specific commands, i.e. those that doesn't need to interact with docker at all. but no for all of them, starting the local monitoring stack, as example, might be problematic regardless (and I'm not really sure working with podman out of the box, maybe it's now does) we are using the docker with it's python client is some cases, for specific reason (also in dtest), you might not like that, and it might have some way of co-exist with podman, but we are trying to help the people using those tool, and specifying the exact things they need. when we recommend virtualenvs, you are the first to say this recommendation it too much, and sct shold be able to work with what ever fedora packing system decide upon. again those eco-systems are working differently from you assumptions. if it's golang, or java-script npm based setup. the hydra was a step trying to remove some of those things out, you don't have to use it, like you are not forced to use dbuild for building scylla (but it's the suggested option, for very very good reason)

We are suggestion an option, on how to get SCT running, heck, we are suggestion a few of them. seems like none of them is good enough, and maybe can be improved.

but decision were made, and regardless if you are happy or not with them, we are not gonna replace all of those tomorrow morning.

NOTE: My rant here is not specifically about QA. The insistence on extremely-specific development enviroment is a general problem in our company, that I think is hurting developers including would-be open-source contributors that don't exist partially because of this reason. I'm just as frustrated with Avi's acceptance of the fact that Scylla can only build on extremely specific Linux distributions (different from QA's choice), and I try to air this frustration as well. Evidently, papering over this problem with docker has its limits too. There has to be a better way.

eliransin commented 3 years ago

@nyh this looks like more of a theological than technical discussion to me, I agree that you do have a point that choosing one operating system over the other can be problematic. My angle on it though is quite different that yours and it actually represents the exact opposite approach, when we choose to develop (or build) on a very specific environment it make it very hard to claim that you can build and develop Scylla on every modern linux system. But with the help of docker which is yet another package available on most modern systems you can do the following claim today: "you can build Scylla on every modern linux system that has a docker package installed" which is much better IMO that "you can try and install all of scylla's dependencies from your distro repo and then you should be able to build scylla :slightly_smiling_face: " Moreover, how much do you think we should invest in this? The matrix is huge and Roy's claim just make it more obvious that we should use containerization in our environments. What I believe Roy meant in his claim that it will probably doesn't work is that they don't test it regularly on fedora. If you think about it, you can run it without the container and it is guarantied to work, here's how:

  1. Install a vanilla distro which is identical to that of the hydra container.
  2. Install all of the packages that the Dockerfile in SCT says that should be installed on that distro. (we can easily to translate it to install-deps.sh if needed)
  3. Run the install hydra on your machine once - this will force you to install and configure AWS and friends
  4. Run the the docker entry point script on you machine once - this will download the keys used to ssh into the instances.
  5. You are good to go - now you can run the SCT script and it guarantied to work.

Of course that if you don't want to install the same distro and version from the hydra docker it is not guarantied to work since it wasn't tested on that environment. The guarantee is even less strong if you also want to have other things working in conjunction like building scylla because there might be a collision in the requirements between the two. So I don't think that docker is a big no no. However here is what I do agree, if we do use docker (or any kind of container) we need to make sure it indeed allows us to support as many distributions and systems as possible. I more or less (but it is only me) draw the line at the point that tells you to change your kernel startup parameters.

but decision were made, and regardless if you are happy or not with them, we are not gonna replace all of those tomorrow morning.

@fruch this decision has consequences, one of them is that everyone uses fedora can't use your tools (unless he is whiling to jump through the Docker hoops). I think you should say it specifically and call it by name - "The QA decided not to support Fedora and possibly every future Rehl like system since it looks like all of them are going towards podman as a container solution"

nyh commented 3 years ago

it might work for very specific commands, i.e. those that doesn't need to interact with docker at all.

I believe that all the users who complained about this - on this issue and outside this issue - want to use SCT on a cloud cluster. In fact that is why we developers want to use SCT (if we wanted to run Scylla locally, we don't really need SCT for that). It is less important for me whether, if I ever ask SCT to run nodes locally, that would require docker. I'm not planning to do this anyway.

but no for all of them, starting the local monitoring stack, as example, might be problematic regardless (and I'm not really sure working with podman out of the box, maybe it's now does)

Yes. @amnonh, with others' help, fixed his scripts to be compatible with both docker and podman. It wasn't that difficult. It also didn't happen all at once - Amnon fixed some things, other people helped him to fix other things. This can never happen before the developer expresses interest to support podman and took the first steps.

when we recommend virtualenvs, you are the first to say this recommendation it too much, and sct shold be able to work with what ever fedora packing system decide upon.

Working on any recent distro without any docker/virtualenv is indeed the best, and isn't rocket science - if you currently require boto3==1.15.3 and I have 1.11.17 (on Fedora 32), there's a 99% chance everything will work just fine, and I am willing to take the chance - and even to send a patch if it doesn't. It doesn't need to work perfectly. This is not for production use, it's for development.

But virtualenv is an option too, I never said it wasn't. Just that it's not my favorite option. But docker is even less my favorite :-)

We are suggestion an option, on how to get SCT running, heck, we are suggestion a few of them. seems like none of them is good enough, and maybe can be improved.

I'll need to experiment with the other options besides docker. When I asked, I was warned they do not work... So I just didn't try.

fruch commented 3 years ago

@eliransin https://github.com/scylladb/scylla-cluster-tests/pull/2965

This is the best I could do, and docker inside hydra the isn't working like that, i.e. starting monitoring stack doesn't work.

eliransin commented 3 years ago

This is the best I could do, and docker inside hydra the isn't working like that, i.e. starting monitoring stack doesn't work.

@fruch - are you going to fix this in the future? If yes, how far into the future? Should @asias or I hold our breath waiting for this support? If not - please close this issue with "won't fix" label. Arguing that it can't be done is not helping nor does it true. It can be done and it is a matter of priority and I will understand that it is not part of the priority right now, I also have priorities of my own and if podman support was the most important feature in the world I would have done it myself. What I don't want is to sit around waiting for a fix that is not going to come or waiting for docker support for cgroups v2. If the barrier is technical we can set up sessions and move those barriers out of the way :slightly_smiling_face:

fruch commented 3 years ago

This is the best I could do, and docker inside hydra the isn't working like that, i.e. starting monitoring stack doesn't work.

@fruch - are you going to fix this in the future? If yes, how far into the future? Should @asias or I hold our breath waiting for this support? If not - please close this issue with "won't fix" label. Arguing that it can't be done is not helping nor does it true.

I didn't argue it can't be done, I'm just saying I can't invest more into finding those solutions. I think installing moby is good enough option for now, we don't have the bandwidth to support both options right now.

It can be done and it is a matter of priority and I will understand that it is not part of the priority right now, I also have priorities of my own and if podman support was the most important feature in the world I would have done it myself.

Totally agree, exactly the reason why I'm not gonna invest more time into it, at least no in this stage.

What I don't want is to sit around waiting for a fix that is not going to come or waiting for docker support for cgroups v2. If the barrier is technical we can set up sessions and move those barriers out of the way

I don't mind waiting, since there are, working alternatives right now (i.e. installing moby), for getting this working.

I've uploaded that PR just to demonstrate the core technical issue we are having with it, if someone think it's that important, he can take a look, and write the instruction on how to get it working (similar compiling scylla-core on ubuntu)

eliransin commented 3 years ago

OK, so I believe you can close this issue with the "wont fix" label.

roydahan commented 3 years ago

Eliran / Asias & Nadav, if you think that you still have gaps here, let's hold a meeting to discuss it.

On Mon, Dec 7, 2020 at 11:18 AM Eliran Sinvani notifications@github.com wrote:

OK, so I believe you can close this issue with the "wont fix" label.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scylladb/scylla-cluster-tests/issues/1897#issuecomment-739787995, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE75CYF6OSX7BYGBZWUAX6TSTSMXVANCNFSM4LCAPECA .

nyh commented 2 years ago

Should we close this issue? Fedora 31 is no longer relevant, and on Fedora 34 which I just tested, both docker (moby) and podman come from Fedora, so even if Hydra still works only on docker (is that even true?), it's no longer a problem?

fruch commented 2 years ago

@nyh since #4648 both should work, at least for simple command. for running a whole test, I would recommend using docker (i.e. we didn't tested all of the commands with podman)