scylladb / scylla-cluster-tests

Tests for Scylla Clusters
GNU Affero General Public License v3.0
57 stars 94 forks source link

offline-installer: Failed to send email in pipeline #5409

Open yaronkaikov opened 1 year ago

yaronkaikov commented 1 year ago

Seen both in 5.0 and 2022.1 (maybe some missing backport:?)

https://jenkins.scylladb.com/job/scylla-5.0/job/artifacts-offline-install/job/artifacts-ubuntu2204-nonroot-test/ is failing for almost 2 month on sending emails. this actually fail the entire pipeline Jenkins job

07:40:25  + ./docker/env/hydra.sh send-email --test-status SUCCESS --start-time 1668316818 --logdir /home/jenkins/slave/workspace/scylla-5.0/artifacts-offline-install/artifacts-ubuntu2204-nonroot-test/scylla-cluster-tests --email-recipients qa@scylladb.com
07:40:25  Running on Build Server...
07:40:25  Pull version v1.21 from Docker Hub...
07:40:25  v1.21: Pulling from scylladb/hydra
07:40:25  Digest: sha256:2267ab58cf1daf8980f1804893b9e5f293206e3e291bb64717c67f5f697238f5
07:40:25  Status: Image is up to date for scylladb/hydra:v1.21
07:40:25  docker.io/scylladb/hydra:v1.21
07:40:25  Obtaining QA SSH keys...
07:40:25  QA SSH keys obtained.
07:40:25  Going to run './sct.py  send-email --test-status SUCCESS --start-time 1668316818 --logdir /home/jenkins/slave/workspace/scylla-5.0/artifacts-offline-install/artifacts-ubuntu2204-nonroot-test/scylla-cluster-tests --email-recipients qa@scylladb.com'...
07:40:25  Obtaining QA SSH keys...
07:40:27  QA SSH keys obtained.
07:40:27  /usr/local/lib/python3.10/site-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated
07:40:27    "class": algorithms.Blowfish,
07:40:31  install-bash-completion current path: /home/jenkins/slave/workspace/scylla-5.0/artifacts-offline-install/artifacts-ubuntu2204-nonroot-test/scylla-cluster-tests
07:40:31  New directory created: /home/jenkins/sct-results/20221113-054030-699996-send-email
07:40:31  Email will be sent to next recipients: qa@scylladb.com
07:40:31  Found latest test_id: 4e2e3969-de5c-46eb-af2a-2ef13dde7fd4
07:40:31  Collect logs for test-run with test-id: 4e2e3969-de5c-46eb-af2a-2ef13dde7fd4
07:40:31  Search dir with logs locally for test id: 4e2e3969-de5c-46eb-af2a-2ef13dde7fd4
07:40:31  Search result Command exited with status 0.
07:40:31  === stdout ===
07:40:31  /home/jenkins/slave/workspace/scylla-5.0/artifacts-offline-install/artifacts-ubuntu2204-nonroot-test/scylla-cluster-tests/20221113-052051-714737/test_id
07:40:31  
07:40:31  (no stderr)
07:40:31  ['/home/jenkins/slave/workspace/scylla-5.0/artifacts-offline-install/artifacts-ubuntu2204-nonroot-test/scylla-cluster-tests/20221113-052051-714737/test_id']
07:40:31  Results file not found
yaronkaikov commented 1 year ago

/cc @Annamikhlin

fgelcer commented 1 year ago

dup of https://github.com/scylladb/scylladb/issues/11117#issuecomment-1313552848

fruch commented 1 year ago

dup of scylladb/scylladb#11117 (comment)

how those are connected to each ? can you explain ? how scylla bug makes SCT to not create results file ?

fruch commented 1 year ago

SCT code shouldn't be failing in reporting results, we should fix the fallback to have something like N/A, but still send out of the email and report to argus

< t:2022-11-14 11:45:19,120 f:events_processes.py l:147  c:sdcm.sct_events.events_processes p:DEBUG > Get process `EVENTS_FILE_LOGGER' from EventsProcessesRegistry[lod_dir=/home/jenkins/slave/workspace/scylla-5.0/artifacts-offline-install/artifacts-ubuntu2204-nonroot-test/scylla-cluster-tests/20221114-113226-342996,id=0x7f712738ab00,default=True]
< t:2022-11-14 11:45:19,120 f:remote_base.py  l:520  c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "/home/scylla-test/scylladb/bin/scylla --version"...
< t:2022-11-14 11:45:19,623 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > FATAL: Exception during startup, aborting: std::runtime_error (Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application)
< t:2022-11-14 11:45:19,623 f:cluster.py      l:2092 c:sdcm.cluster_gce     p:DEBUG > Node artifacts-ubuntu2204-jenkins-db-node-25f46237-0-1 [34.73.109.90 | 10.142.0.123] (seed: True): Unable to get ScyllaDB version using `/home/scylla-test/scylladb/bin/scylla --version':
< t:2022-11-14 11:45:19,623 f:cluster.py      l:2092 c:sdcm.cluster_gce     p:DEBUG > 
< t:2022-11-14 11:45:19,623 f:cluster.py      l:2092 c:sdcm.cluster_gce     p:DEBUG > FATAL: Exception during startup, aborting: std::runtime_error (Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application)
< t:2022-11-14 11:45:19,623 f:remote_base.py  l:520  c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "dpkg-query --show --showformat '${Version}' scylla"...
< t:2022-11-14 11:45:20,166 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > dpkg-query: no packages found matching scylla
< t:2022-11-14 11:45:20,167 f:cluster.py      l:2104 c:sdcm.cluster_gce     p:DEBUG > Node artifacts-ubuntu2204-jenkins-db-node-25f46237-0-1 [34.73.109.90 | 10.142.0.123] (seed: True): Unable to get ScyllaDB version using `dpkg-query --show --showformat '${Version}' scylla':
< t:2022-11-14 11:45:20,167 f:cluster.py      l:2104 c:sdcm.cluster_gce     p:DEBUG > 
< t:2022-11-14 11:45:20,167 f:cluster.py      l:2104 c:sdcm.cluster_gce     p:DEBUG > dpkg-query: no packages found matching scylla
< t:2022-11-14 11:45:20,167 f:cluster.py      l:2114 c:sdcm.cluster_gce     p:WARNING > Node artifacts-ubuntu2204-jenkins-db-node-25f46237-0-1 [34.73.109.90 | 10.142.0.123] (seed: True): All attempts to get ScyllaDB version failed. Looks like there is no ScyllaDB installed.
< t:2022-11-14 11:45:20,168 f:events_processes.py l:147  c:sdcm.sct_events.events_processes p:DEBUG > Get process `EVENTS_FILE_LOGGER' from EventsProcessesRegistry[lod_dir=/home/jenkins/slave/workspace/scylla-5.0/artifacts-offline-install/artifacts-ubuntu2204-nonroot-test/scylla-cluster-tests/20221114-113226-342996,id=0x7f712738ab00,default=True]
< t:2022-11-14 11:45:20,168 f:events_processes.py l:147  c:sdcm.sct_events.events_processes p:DEBUG > Get process `EVENTS_FILE_LOGGER' from EventsProcessesRegistry[lod_dir=/home/jenkins/slave/workspace/scylla-5.0/artifacts-offline-install/artifacts-ubuntu2204-nonroot-test/scylla-cluster-tests/20221114-113226-342996,id=0x7f712738ab00,default=True]
< t:2022-11-14 11:45:20,169 f:events_processes.py l:147  c:sdcm.sct_events.events_processes p:DEBUG > Get process `EVENTS_FILE_LOGGER' from EventsProcessesRegistry[lod_dir=/home/jenkins/slave/workspace/scylla-5.0/artifacts-offline-install/artifacts-ubuntu2204-nonroot-test/scylla-cluster-tests/20221114-113226-342996,id=0x7f712738ab00,default=True]
< t:2022-11-14 11:45:20,169 f:tester.py       l:3263 c:ArtifactsTest        p:ERROR > Error while saving email data. Error: argument of type 'NoneType' is not iterable
fgelcer commented 1 year ago

@fruch , shoud #5417 fix this issue now?

fruch commented 1 year ago

@fruch , shoud #5417 fix this issue now?

I don't think so...

roydahan commented 1 year ago

Why do we have empty results file?

fruch commented 1 year ago

In this specific case it was cause we failed to get the scylla version

we are trying to get via scylla --version and fallback to using yum/apt

In this one, it's installing from relocate package, and both methods failed, which in turn fails the creation of the email results file (and also the argus reporting, before Alex fixed it)

All our reporting code is a bit fragile, and any failing in getting some part of info, shouldn't fail the whole report, we should publish an event with the error so it would be noticed, but still carry on and send the results we do have.