samba-in-kubernetes / sit-environment

SIT (Samba Integration Testing) Framework
GNU General Public License v3.0
1 stars 7 forks source link

CephFS job/s times out waiting for OSDs to come up #103

Closed anoopcs9 closed 3 months ago

anoopcs9 commented 3 months ago
FAILED - RETRYING: [storage0]: Wait for Ceph OSDs to come up (5 retries left).
FAILED - RETRYING: [storage0]: Wait for Ceph OSDs to come up (4 retries left).
FAILED - RETRYING: [storage0]: Wait for Ceph OSDs to come up (3 retries left).
FAILED - RETRYING: [storage0]: Wait for Ceph OSDs to come up (2 retries left).
FAILED - RETRYING: [storage0]: Wait for Ceph OSDs to come up (1 retries left).
fatal: [storage0]: FAILED! => {"attempts": 100, "changed": true, "cmd": ["/root/cephadm", "shell", "--", "ceph", "orch", "ls", "osd", "--format=json"], "delta": "0:00:01.223162", "end": "2024-06-03 10:00:09.123666", "msg": "", "rc": 0, "start": "2024-06-03 10:00:07.900504", "stderr": "Inferring fsid c34a0c5e-218e-11ef-93af-525400babb62\nInferring config /var/lib/ceph/c34a0c5e-218e-11ef-93af-525400babb62/mon.storage0/config\nUsing ceph image with id 'ba97bb442d6e' and tag 'main' created on 2024-06-02 21:52:50 +0000 UTC\nquay.ceph.io/ceph-ci/ceph@sha256:1b15b7a25500815a8f1461030e3e44f7d8c33c82369e3bab6ef02b778d1bd8c9", "stderr_lines": ["Inferring fsid c34a0c5e-218e-11ef-93af-525400babb62", "Inferring config /var/lib/ceph/c34a0c5e-218e-11ef-93af-525400babb62/mon.storage0/config", "Using ceph image with id 'ba97bb442d6e' and tag 'main' created on 2024-06-02 21:52:50 +0000 UTC", "quay.ceph.io/ceph-ci/ceph@sha256:1b15b7a25500815a8f1461030e3e44f7d8c33c82369e3bab6ef02b778d1bd8c9"], "stdout": "\n[{\"events\": [\"2024-06-03T09:52:25.620085Z service:osd.all-available-devices [INFO] \\\"service was created\\\"\"], \"placement\": {\"host_pattern\": \"*\"}, \"service_id\": \"all-available-devices\", \"service_name\": \"osd.all-available-devices\", \"service_type\": \"osd\", \"spec\": {\"data_devices\": {\"all\": true}, \"filter_logic\": \"AND\", \"objectstore\": \"bluestore\"}, \"status\": {\"created\": \"2024-06-03T09:52:25.609558Z\", \"running\": 0, \"size\": 0}}]", "stdout_lines": ["", "[{\"events\": [\"2024-06-03T09:52:25.620085Z service:osd.all-available-devices [INFO] \\\"service was created\\\"\"], \"placement\": {\"host_pattern\": \"*\"}, \"service_id\": \"all-available-devices\", \"service_name\": \"osd.all-available-devices\", \"service_type\": \"osd\", \"spec\": {\"data_devices\": {\"all\": true}, \"filter_logic\": \"AND\", \"objectstore\": \"bluestore\"}, \"status\": {\"created\": \"2024-06-03T09:52:25.609558Z\", \"running\": 0, \"size\": 0}}]"]}

[cephfs] https://jenkins-samba.apps.ocp.cloud.ci.centos.org/job/samba_cephfs-integration-environment/535/console

[cephfs.vfs] https://jenkins-samba.apps.ocp.cloud.ci.centos.org/job/samba_cephfs.vfs-integration-environment/384/console

anoopcs9 commented 3 months ago

We had the following traceback in cephadm logs from recent jobs:

2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr exception caught by decorator
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr Traceback (most recent call last):
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr   File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 59, in newfunc
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr     return f(*a, **kw)
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr   File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 110, in main
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr     self.enable_plugins()
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr   File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 75, in enable_plugins
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr     plugins = _load_library_extensions()
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr   File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 180, in _load_library_extensions
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr     for ep in entry_points(group=group):
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr TypeError: entry_points() got an unexpected keyword argument 'group'
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr Traceback (most recent call last):
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr   File "/usr/sbin/ceph-volume", line 33, in <module>
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr     sys.exit(load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')())
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr   File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 46, in __init__
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr     self.main(self.argv)
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr   File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 59, in newfunc
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr     return f(*a, **kw)
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr   File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 110, in main
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr     self.enable_plugins()
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr   File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 75, in enable_plugins
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr     plugins = _load_library_extensions()
2024-06-03 09:59:47,064 7f1900dc1740 INFO /usr/bin/podman: stderr   File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 180, in _load_library_extensions
2024-06-03 09:59:47,065 7f1900dc1740 INFO /usr/bin/podman: stderr     for ep in entry_points(group=group):
2024-06-03 09:59:47,065 7f1900dc1740 INFO /usr/bin/podman: stderr TypeError: entry_points() got an unexpected keyword argument 'group'

It turned out to be a problem with ceph-volume, see issue #66328 which got fixed via https://github.com/ceph/ceph/pull/57830.

anoopcs9 commented 3 months ago

New ceph container images are out with the fix and CI is back green.