scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
57 stars 95 forks source link

`scylla-server-dbg` binary is not installed on the K8S backends #6214

Open vponomaryov opened 1 year ago

vponomaryov commented 1 year ago

Prerequisites

Versions

Logs

Description

If core dumps happens running on K8S backend then we get following error:

04:17:02  < t:2023-05-28 01:17:02,579 f:db_log_reader.py l:201  c:sdcm.db_log_reader   p:ERROR > failed to read db log < t:2023-05-28 01:17:02,579 f:db_log_reader.py l:201  c:sdcm.db_log_reader   p:ERROR > failed to read db log
04:17:02  < t:2023-05-28 01:17:02,579 f:db_log_reader.py l:201  c:sdcm.db_log_reader   p:ERROR > Traceback (most recent call last):
04:17:02  < t:2023-05-28 01:17:02,579 f:db_log_reader.py l:201  c:sdcm.db_log_reader   p:ERROR >   File "/home/ubuntu/scylla-cluster-tests/sdcm/db_log_reader.py", line 197, in run
04:17:02  < t:2023-05-28 01:17:02,579 f:db_log_reader.py l:201  c:sdcm.db_log_reader   p:ERROR >     self._read_and_publish_events()
04:17:02  < t:2023-05-28 01:17:02,579 f:db_log_reader.py l:201  c:sdcm.db_log_reader   p:ERROR >   File "/home/ubuntu/scylla-cluster-tests/sdcm/db_log_reader.py", line 176, in _read_and_publish_events
04:17:02  < t:2023-05-28 01:17:02,579 f:db_log_reader.py l:201  c:sdcm.db_log_reader   p:ERROR >     scylla_debug_info = self.get_scylla_debuginfo_file()
04:17:02  < t:2023-05-28 01:17:02,579 f:db_log_reader.py l:201  c:sdcm.db_log_reader   p:ERROR >   File "/home/ubuntu/scylla-cluster-tests/sdcm/db_log_reader.py", line 251, in get_scylla_debuginfo_file
04:17:02  < t:2023-05-28 01:17:02,579 f:db_log_reader.py l:201  c:sdcm.db_log_reader   p:ERROR >     raise Exception("Couldn't find scylla debug information")
04:17:02  < t:2023-05-28 01:17:02,579 f:db_log_reader.py l:201  c:sdcm.db_log_reader   p:ERROR > Exception: Couldn't find scylla debug information
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > [SCT internal warning] (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0d65f3b9-9938-4c48-a0d5-672090ace9eb during_nemesis=GrowShrinkNewRack: type=ABORTING_ON_SHARD regex=Aborting on shard line_number=11093 node=sct-cluster-us-east1-b-us-east1-1-0
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > Aborting on shard 0.
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > 0x51b4da8
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > 0x51e71e2
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > /libreloc/libc.so.6+0x3cb1f
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > /libreloc/libc.so.6+0x8ce5b
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > /libreloc/libc.so.6+0x3ca75
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > /libreloc/libc.so.6+0x267fb
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > /libreloc/libc.so.6+0x2671a
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > /libreloc/libc.so.6+0x35655
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > 0x3872536
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > 0x3870ca1
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > 0x3838cfc
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > 0x383f36c
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > 0x3a40b8c
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > 0x1267d3a
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > 0x51c5624
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > 0x51c68a7
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > 0x51c5be9
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > 0x516c635
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > 0x516b7a8
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > 0x11ab320
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > 0x11ace80
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > 0x11a9a2a
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > /libreloc/libc.so.6+0x2750f
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > /libreloc/libc.so.6+0x275c8
04:17:02  < t:2023-05-28 01:17:02,580 f:base.py         l:325  c:sdcm.sct_events.base p:WARNING > 0x11a7be4 has not been published or dumped, maybe you missed .publish()

Installation of the package fails:

2023-05-27 23:45:31,329 f:cluster.py      l:2070 c:sdcm.cluster         p:DEBUG > Node sct-cluster-us-east1-b-us-east1-0 [10.0.0.130 | 10.0.0.168] (seed: False): Installing Scylla debug info...
2023-05-27 23:45:31,329 f:remote_base.py  l:520  c:KubernetesCmdRunner  p:DEBUG > Running command "apt-get install -y scylla-enterprise-server-dbg=2023.1.0~rc5\*"...
2023-05-27 23:45:32,359 f:base.py         l:222  c:KubernetesCmdRunner  p:DEBUG > Reading package lists...
2023-05-27 23:45:32,527 f:base.py         l:222  c:KubernetesCmdRunner  p:DEBUG > Building dependency tree...
2023-05-27 23:45:32,527 f:base.py         l:222  c:KubernetesCmdRunner  p:DEBUG > Reading state information...
2023-05-27 23:45:32,535 f:base.py         l:222  c:KubernetesCmdRunner  p:DEBUG > E: Unable to locate package scylla-enterprise-server-dbg

So, need to make it work.

Steps to Reproduce

  1. Run the job with the configuration from this bugreport
  2. Wait while coredump happens
  3. See error

Expected behavior: scylla-server-dbg binary must be installed on all the K8S scylla pods.

Actual behavior: scylla-server-dbg binary fails to be installed.

vponomaryov commented 1 year ago

It is open question about how we should get it on K8S. AMIs already have it pre-installed, but not docker images...

vponomaryov commented 1 year ago

Played with it manually, and the scylla-server-dbg binary cannot be installed without all other scylla binaries:

# curl --retry 5 --retry-max-time 300 -o /etc/apt/sources.list.d/scylla.list -L https://s3.amazonaws.com/downloads.scylladb.com/deb/ubuntu/scylla-5.2.list
# mkdir -p /etc/apt/keyrings
# apt-key adv --keyserver keyserver.ubuntu.com --recv-keys $key1
# apt-key adv --keyserver keyserver.ubuntu.com --recv-keys $key2
# apt-key adv --keyserver keyserver.ubuntu.com --recv-keys $key3

# apt install gnupg2 # or gnupg?

# gpg --homedir /tmp --no-default-keyring --keyring /etc/apt/keyrings/scylladb.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys $key1
# gpg --homedir /tmp --no-default-keyring --keyring /etc/apt/keyrings/scylladb.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys $key1
# gpg --homedir /tmp --no-default-keyring --keyring /etc/apt/keyrings/scylladb.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys $key1

# apt-get install scylla-server-dbg
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  scylla scylla-conf scylla-jmx scylla-kernel-conf scylla-node-exporter scylla-python3 scylla-server scylla-tools scylla-tools-core
The following NEW packages will be installed:
  scylla-server-dbg
The following packages will be upgraded:
  scylla scylla-conf scylla-jmx scylla-kernel-conf scylla-node-exporter scylla-python3 scylla-server scylla-tools scylla-tools-core
9 upgraded, 1 newly installed, 0 to remove and 23 not upgraded.
Need to get 333 MB of archives.
After this operation, 1770 MB of additional disk space will be used.
Do you want to continue? [Y/n] n
Abort.

# scylla --version
5.2.1-0.20230508.f1c45553bc29

# apt-get install scylla-server-dbg=5.2.1-0.20230508.f1c45553bc29
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package scylla-server-dbg is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Version '5.2.1-0.20230508.f1c45553bc29' for 'scylla-server-dbg' was not found

So, looks like the docker image build config must be updated...

fruch commented 1 year ago

I think first we need to make sure we sent out the event, if there are no debug symbols

Then we could think how to get the symbols, or if we want them part of the image, they weren't always part of the AMIs (SCT still has code to install it)

vponomaryov commented 1 year ago

Core issue: https://github.com/scylladb/scylladb/issues/14184