piraeusdatastore / piraeus-operator

The Piraeus Operator manages LINSTOR clusters in Kubernetes.
https://piraeus.io/
Apache License 2.0
381 stars 60 forks source link

when reboot nodes Pod did not return to normal #518

Closed willzhang closed 1 year ago

willzhang commented 1 year ago

thress nodes

root@node1:~# kubectl get nodes -o wide
NAME    STATUS   ROLES           AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
node1   Ready    control-plane   8d    v1.26.7   192.168.72.30   <none>        Ubuntu 22.04.2 LTS   5.15.0-78-generic   containerd://1.6.22
node2   Ready    control-plane   8d    v1.26.7   192.168.72.31   <none>        Ubuntu 22.04.2 LTS   5.15.0-78-generic   containerd://1.6.22
node3   Ready    control-plane   8d    v1.26.7   192.168.72.32   <none>        Ubuntu 22.04.2 LTS   5.15.0-78-generic   containerd://1.6.22

reboot node1

root@node1:~# reboot

two pods CrashLoopBackOff and never return to normal

root@node1:~# kubectl -n piraeus-datastore  get pods  -o wide
NAME                                                    READY   STATUS             RESTARTS        AGE   IP             NODE    NOMINATED NODE   READINESS GATES
ha-controller-hkws4                                     1/1     Running            1 (7m51s ago)   64m   100.64.1.238   node2   <none>           <none>
ha-controller-nd5p2                                     0/1     CrashLoopBackOff   11 (66s ago)    15m   100.64.0.123   node1   <none>           <none>
ha-controller-trbvh                                     1/1     Running            1 (7m45s ago)   64m   100.64.2.234   node3   <none>           <none>
linstor-controller-97cd7495c-k6kzb                      1/1     Running            1 (7m51s ago)   64m   100.64.1.242   node2   <none>           <none>
linstor-csi-controller-7f85967cd9-z7c56                 7/7     Running            8 (7m51s ago)   55m   100.64.1.244   node2   <none>           <none>
linstor-csi-node-78hz4                                  3/3     Running            3 (7m51s ago)   64m   100.64.1.245   node2   <none>           <none>
linstor-csi-node-9dx8d                                  3/3     Running            3 (7m45s ago)   64m   100.64.2.231   node3   <none>           <none>
linstor-csi-node-wcdgp                                  3/3     Running            6 (9m14s ago)   64m   100.64.0.126   node1   <none>           <none>
node1                                                   1/2     CrashLoopBackOff   12 (15s ago)    15m   100.64.0.124   node1   <none>           <none>
node2                                                   2/2     Running            2 (7m51s ago)   64m   100.64.1.230   node2   <none>           <none>
node3                                                   2/2     Running            2 (7m45s ago)   64m   100.64.2.230   node3   <none>           <none>
piraeus-datastore-controller-manager-6f6b8f48c4-lnzpp   2/2     Running            2 (7m45s ago)   64m   100.64.2.229   node3   <none>           <none>
root@node1:~# 

pod node1 logs

root@node1:~# kubectl -n piraeus-datastore logs -f node1
time="2023-08-10T06:04:36Z" level=info msg="running k8s-await-election" version=refs/tags/v0.3.1
time="2023-08-10T06:04:36Z" level=info msg="not running with leader election"
time="2023-08-10T06:04:36Z" level=info msg="starting command '/usr/bin/piraeus-entry.sh' with arguments: '[startSatellite]'"
LINSTOR, Module Satellite
Version:            1.23.0 (28dbd33ced60d75a2a0562bf5e9bc6b800ae8361)
Build time:         2023-05-23T06:27:14+00:00
Java Version:       11
Java VM:            Debian, Version 11.0.18+10-post-Debian-1deb11u1
Operating system:   Linux, Version 5.15.0-78-generic
Environment:        amd64, 8 processors, 3998 MiB memory reserved for allocations

System components initialization in progress

Loading configuration file "/etc/linstor/linstor_satellite.toml"
06:04:38.725 [main] INFO  LINSTOR/Satellite - SYSTEM - ErrorReporter DB version 1 found.
06:04:38.728 [main] INFO  LINSTOR/Satellite - SYSTEM - Log directory set to: '/var/log/linstor-satellite'
06:04:38.774 [Main] INFO  LINSTOR/Satellite - SYSTEM - Loading API classes started.
06:04:39.098 [Main] INFO  LINSTOR/Satellite - SYSTEM - API classes loading finished: 323ms
06:04:39.098 [Main] INFO  LINSTOR/Satellite - SYSTEM - Dependency injection started.
06:04:39.112 [Main] INFO  LINSTOR/Satellite - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.modularcrypto.FipsCryptoModule"
06:04:39.112 [Main] INFO  LINSTOR/Satellite - SYSTEM - Extension module "com.linbit.linstor.modularcrypto.FipsCryptoModule" is not installed
06:04:39.112 [Main] INFO  LINSTOR/Satellite - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.modularcrypto.JclCryptoModule"
06:04:39.128 [Main] INFO  LINSTOR/Satellite - SYSTEM - Dynamic load of extension module "com.linbit.linstor.modularcrypto.JclCryptoModule" was successful
06:04:39.814 [Main] INFO  LINSTOR/Satellite - SYSTEM - Dependency injection finished: 715ms
06:04:39.814 [Main] INFO  LINSTOR/Satellite - SYSTEM - Cryptography provider: Using default cryptography module
06:04:41.449 [Main] INFO  LINSTOR/Satellite - SYSTEM - Initializing main network communications service
06:04:41.449 [Main] INFO  LINSTOR/Satellite - SYSTEM - Starting service instance 'TimerEventService' of type TimerEventService
06:04:41.449 [Main] INFO  LINSTOR/Satellite - SYSTEM - Starting service instance 'FileEventService' of type FileEventService
06:04:41.450 [Main] INFO  LINSTOR/Satellite - SYSTEM - Starting service instance 'DrbdEventService-1' of type DrbdEventService
06:04:41.452 [Main] INFO  LINSTOR/Satellite - SYSTEM - Starting service instance 'DrbdEventPublisher-1' of type DrbdEventPublisher
06:04:41.452 [Main] INFO  LINSTOR/Satellite - SYSTEM - Starting service instance 'SnapshotShippingService' of type SnapshotShippingService
06:04:41.453 [Main] INFO  LINSTOR/Satellite - SYSTEM - Starting service instance 'BackupShippingS3Service' of type BackupShippingS3Service
06:04:41.453 [Main] INFO  LINSTOR/Satellite - SYSTEM - Starting service instance 'BackupShippingL2LService' of type BackupShippingL2LService
06:04:41.453 [Main] INFO  LINSTOR/Satellite - SYSTEM - Starting service instance 'DeviceManager' of type DeviceManager
06:04:41.455 [Main] INFO  LINSTOR/Satellite - SYSTEM - Starting service instance 'CloneService' of type CloneService
06:04:41.458 [DrbdEventService] WARN  LINSTOR/Satellite - SYSTEM - DRBD 'events2' stream ended unexpectedly
06:04:41.464 [Main] INFO  LINSTOR/Satellite - SYSTEM - NetComService started on port /0:0:0:0:0:0:0:0:3366

06:04:43.945 [MainWorkerPool-1] INFO  LINSTOR/Satellite - SYSTEM - Controller connected and authenticated (100.64.1.220:47764)
06:04:44.092 [MainWorkerPool-2] INFO  LINSTOR/Satellite - SYSTEM - Node 'node1' created.
06:04:44.093 [MainWorkerPool-2] INFO  LINSTOR/Satellite - SYSTEM - Node 'node2' created.
06:04:44.094 [MainWorkerPool-2] INFO  LINSTOR/Satellite - SYSTEM - Node 'node3' created.
06:04:44.097 [MainWorkerPool-2] INFO  LINSTOR/Satellite - SYSTEM - Storage pool 'DfltDisklessStorPool' created.
06:04:44.098 [MainWorkerPool-2] INFO  LINSTOR/Satellite - SYSTEM - Storage pool 'pool1' created.
06:04:44.143 [MainWorkerPool-2] INFO  LINSTOR/Satellite - SYSTEM - Resource 'pvc-439f3984-626e-4433-820e-70e15fae9ab5' created for node 'node1'.
06:04:44.144 [MainWorkerPool-2] INFO  LINSTOR/Satellite - SYSTEM - Resource 'pvc-439f3984-626e-4433-820e-70e15fae9ab5' created for node 'node2'.
06:04:44.144 [MainWorkerPool-2] INFO  LINSTOR/Satellite - SYSTEM - Resource 'pvc-439f3984-626e-4433-820e-70e15fae9ab5' created for node 'node3'.
06:04:44.151 [DeviceManager] INFO  LINSTOR/Satellite - SYSTEM - Removing all res files from /var/lib/linstor.d
06:04:44.229 [DeviceManager] ERROR LINSTOR/Satellite - SYSTEM - Need initial DRBD state [Report number 64D47DF6-D7B7B-000000]

06:04:44.269 [DeviceManager] WARN  LINSTOR/Satellite - SYSTEM - Not calling 'systemd-notify' as NOTIFY_SOCKET is null

pod ha-controller logs

root@node1:~# kubectl -n piraeus-datastore logs -f ha-controller-nd5p2 
I0810 06:08:12.253633       1 agent.go:179] version: v1.1.4
I0810 06:08:12.253726       1 agent.go:180] node: node1
I0810 06:08:12.253872       1 agent.go:203] waiting for caches to sync
I0810 06:08:12.354540       1 agent.go:205] caches synced
I0810 06:08:12.354577       1 agent.go:228] starting reconciliation
E0810 06:08:12.354866       1 run.go:74] "command failed" err="failed to execute drbdsetup status --json: exit status 20"
root@node1:~# 
WanzenBug commented 1 year ago

It seems DRBD did not load after the reboot. Can you show the logs of the drbd-module-loader container:

kubectl -n piraeus-datastore logs -f node1 -c drbd-module-loader
willzhang commented 1 year ago
root@node1:~# kubectl -n piraeus-datastore logs -f node1 -c drbd-module-loader
DRBD module is already loaded

DRBD version loaded:
version: 8.4.11 (api:1/proto:86-101)
srcversion: 98E710E58B3041F3046305B 

and node2 node3 have many logs

root@node1:~# kubectl -n piraeus-datastore logs -f node2 -c drbd-module-loader
Need a git checkout to regenerate drbd/.drbd_git_revision
make[1]: Entering directory '/tmp/pkg/drbd-9.2.3/drbd'

    Calling toplevel makefile of kernel source tree, which I believe is in
    KDIR=/lib/modules/5.15.0-78-generic/build

make -C /lib/modules/5.15.0-78-generic/build   M=/tmp/pkg/drbd-9.2.3/drbd  modules
warning: the compiler differs from the one used to build the kernel
  The kernel was built by: gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
  You are using:           gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
  COMPAT  __vmalloc_has_2_params
  COMPAT  add_disk_returns_int
  COMPAT  before_4_13_kernel_read
  COMPAT  bio_alloc_has_4_params
  COMPAT  blkdev_issue_discard_takes_flags
  COMPAT  blkdev_issue_zeroout_discard
  COMPAT  can_include_vermagic_h
  COMPAT  dax_direct_access_takes_mode
  COMPAT  fs_dax_get_by_bdev_takes_start_off
  COMPAT  fs_dax_get_by_bdev_takes_start_off_and_holder
  COMPAT  genl_policy_in_ops
  COMPAT  have_BIO_MAX_VECS
  COMPAT  have_CRYPTO_TFM_NEED_KEY
  COMPAT  have_GENHD_FL_NO_PART
  COMPAT  have_SHASH_DESC_ON_STACK
  COMPAT  have_WB_congested_enum
  COMPAT  have_allow_kernel_signal
  COMPAT  have_bdev_discard_granularity
  COMPAT  have_bdev_max_discard_sectors
  COMPAT  have_bdev_nr_sectors
  COMPAT  have_bdevname
  COMPAT  have_bdgrab
  COMPAT  have_bdi_congested
  COMPAT  have_bdi_congested_fn
  COMPAT  have_bio_alloc_clone
  COMPAT  have_bio_bi_bdev
  COMPAT  have_bio_bi_error
  COMPAT  have_bio_bi_opf
  COMPAT  have_bio_bi_status
  COMPAT  have_bio_clone_fast
  COMPAT  have_bio_op_shift
  COMPAT  have_bio_set_dev
  COMPAT  have_bio_set_op_attrs
  COMPAT  have_bio_split_to_limits
  COMPAT  have_bio_start_io_acct
  COMPAT  have_bioset_init
  COMPAT  have_bioset_need_bvecs
  COMPAT  have_blk_alloc_disk
  COMPAT  have_blk_alloc_queue_rh
  COMPAT  have_blk_check_plugged
  COMPAT  have_blk_cleanup_disk
  COMPAT  have_blk_qc_t_make_request
  COMPAT  have_blk_qc_t_submit_bio
  COMPAT  have_blk_queue_flag_set
  COMPAT  have_blk_queue_make_request
  COMPAT  have_blk_queue_max_write_same_sectors
  COMPAT  have_blk_queue_merge_bvec
  COMPAT  have_blk_queue_split_bio
  COMPAT  have_blk_queue_split_q_bio
  COMPAT  have_blk_queue_split_q_bio_bioset
  COMPAT  have_blk_queue_update_readahead
  COMPAT  have_blk_queue_write_cache
  COMPAT  have_bvec_kmap_local
  COMPAT  have_d_inode
  COMPAT  have_disk_update_readahead
  COMPAT  have_fallthrough
  COMPAT  have_fs_dax_get_by_bdev
  COMPAT  have_generic_start_io_acct_q_rw_sect_part
  COMPAT  have_generic_start_io_acct_rw_sect_part
  COMPAT  have_get_random_u32
  COMPAT  have_get_random_u32_below
  COMPAT  have_hd_struct
  COMPAT  have_ib_cq_init_attr
  COMPAT  have_ib_get_dma_mr
  COMPAT  have_idr_is_empty
  COMPAT  have_inode_lock
  COMPAT  have_ktime_to_timespec64
  COMPAT  have_kvfree
  COMPAT  have_kvfree_rcu
  COMPAT  have_list_is_first
  COMPAT  have_list_next_entry
  COMPAT  have_max_send_recv_sge
  COMPAT  have_nla_nest_start_noflag
  COMPAT  have_nla_parse_deprecated
  COMPAT  have_nla_put_64bit
  COMPAT  have_nla_strscpy
  COMPAT  have_part_stat_h
  COMPAT  have_part_stat_read_accum
  COMPAT  have_pointer_backing_dev_info
  COMPAT  have_proc_create_single
  COMPAT  have_queue_flag_discard
  COMPAT  have_queue_flag_stable_writes
  COMPAT  have_rb_declare_callbacks_max
  COMPAT  have_refcount_inc
  COMPAT  have_req_hardbarrier
  COMPAT  have_req_noidle
  COMPAT  have_req_nounmap
  COMPAT  have_req_op_write
  COMPAT  have_req_op_write_zeroes
  COMPAT  have_req_write
  COMPAT  have_revalidate_disk_size
  COMPAT  have_sched_set_fifo
  COMPAT  have_sched_signal_h
  COMPAT  have_security_netlink_recv
  COMPAT  have_sendpage_ok
  COMPAT  have_set_capacity_and_notify
  COMPAT  have_shash_desc_zero
  COMPAT  have_simple_positive
  COMPAT  have_sock_set_keepalive
  COMPAT  have_strscpy
  COMPAT  have_struct_bvec_iter
  COMPAT  have_struct_size
  COMPAT  have_submit_bio_noacct
  COMPAT  have_tcp_sock_set_cork
  COMPAT  have_tcp_sock_set_keepcnt
  COMPAT  have_tcp_sock_set_keepidle
  COMPAT  have_tcp_sock_set_nodelay
  COMPAT  have_tcp_sock_set_quickack
  COMPAT  have_time64_to_tm
  COMPAT  have_timer_setup
  COMPAT  have_void_make_request
  COMPAT  have_void_submit_bio
  COMPAT  ib_alloc_pd_has_2_params
  COMPAT  ib_device_has_ops
  COMPAT  ib_post_send_const_params
  COMPAT  ib_query_device_has_3_params
  COMPAT  need_drbd_wrappers
  COMPAT  need_make_request_recursion
  COMPAT  need_skb_abort_seq_read
  COMPAT  part_stat_read_takes_block_device
  COMPAT  queue_limits_has_discard_zeroes_data
  COMPAT  rdma_create_id_has_net_ns
  COMPAT  rdma_reject_has_reason_arg
  COMPAT  sk_data_ready_has_1_param
  COMPAT  sock_create_kern_has_netns_parameter
  COMPAT  sock_ops_returns_addr_len
  COMPAT  struct_gendisk_has_backing_dev_info
  UPD     /tmp/pkg/drbd-9.2.3/drbd/compat.5.15.99.h
  UPD     /tmp/pkg/drbd-9.2.3/drbd/compat.h
make[4]: 'drbd-kernel-compat/cocci_cache/618a16740b5c8d49f5fd464218ac2850/compat.patch' is up to date.
  PATCH
patching file ./drbd_int.h
patching file drbd_transport_tcp.c
patching file drbd_state.c
patching file drbd_req.c
patching file drbd_receiver.c
patching file drbd_nl.c
patching file drbd_main.c
patching file drbd_debugfs.c
patching file drbd_dax_pmem.c
patching file drbd_bitmap.c
patching file drbd_actlog.c
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_dax_pmem.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_debugfs.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_bitmap.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_proc.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_sender.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_receiver.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_req.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_actlog.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_main.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_strings.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_nl.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_interval.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_state.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd-kernel-compat/drbd_wrappers.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_nla.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_transport.o
  GEN     /tmp/pkg/drbd-9.2.3/drbd/drbd_buildtag.c 
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_buildtag.o
  LD [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_tcp.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_rdma.o
  MODPOST /tmp/pkg/drbd-9.2.3/drbd/Module.symvers
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd.mod.o
  LD [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd.ko
  BTF [M] /tmp/pkg/drbd-9.2.3/drbd/drbd.ko
Skipping BTF generation for /tmp/pkg/drbd-9.2.3/drbd/drbd.ko due to unavailability of vmlinux
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_rdma.mod.o
  LD [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_rdma.ko
  BTF [M] /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_rdma.ko
Skipping BTF generation for /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_rdma.ko due to unavailability of vmlinux
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_tcp.mod.o
  LD [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_tcp.ko
Skipping BTF generation for /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_tcp.ko due to unavailability of vmlinux
  BTF [M] /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_tcp.ko
mv .drbd_kernelrelease.new .drbd_kernelrelease
Memorizing module configuration ... done.
make[1]: Leaving directory '/tmp/pkg/drbd-9.2.3/drbd'

        Module build was successful.
=======================================================================
  With DRBD module version 8.4.5, we split out the management tools
  into their own repository at https://github.com/LINBIT/drbd-utils
  (tarball at http://links.linbit.com/drbd-download)

  That started out as "drbd-utils version 8.9.0",
  has a different release cycle,
  and provides compatible drbdadm, drbdsetup and drbdmeta tools
  for DRBD module versions 8.3, 8.4 and 9.

  Again: to manage DRBD 9 kernel modules and above,
  you want drbd-utils >= 9.3 from above url.
=======================================================================

DRBD version loaded:
version: 9.2.3 (api:2/proto:86-122)
GIT-hash: c142ca1280c41aee1330b980544ef276330ff6ef build by @node2, 2023-08-10 06:14:21
Transports (api:18): tcp (9.2.3) rdma (9.2.3)
WanzenBug commented 1 year ago

That is an unsupported version, which should not be loaded at boot time. You might need to systemctl disable drbd.service on the host.

Then, run rmmod drbd on node1 and delete the node1 Pod, so it restarts again.

willzhang commented 1 year ago

yes, i do that, all pods start and running, but why it use version: 8.4.11, is this a bug?

root@node1:~# kubectl -n piraeus-datastore logs -f node1 -c drbd-module-loader
Need a git checkout to regenerate drbd/.drbd_git_revision
make[1]: Entering directory '/tmp/pkg/drbd-9.2.3/drbd'

    Calling toplevel makefile of kernel source tree, which I believe is in
    KDIR=/lib/modules/5.15.0-78-generic/build

make -C /lib/modules/5.15.0-78-generic/build   M=/tmp/pkg/drbd-9.2.3/drbd  modules
warning: the compiler differs from the one used to build the kernel
  The kernel was built by: gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
  You are using:           gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
  COMPAT  __vmalloc_has_2_params
  COMPAT  add_disk_returns_int
  COMPAT  before_4_13_kernel_read
  COMPAT  bio_alloc_has_4_params
  COMPAT  blkdev_issue_discard_takes_flags
  COMPAT  blkdev_issue_zeroout_discard
  COMPAT  can_include_vermagic_h
  COMPAT  dax_direct_access_takes_mode
  COMPAT  fs_dax_get_by_bdev_takes_start_off
  COMPAT  fs_dax_get_by_bdev_takes_start_off_and_holder
  COMPAT  genl_policy_in_ops
  COMPAT  have_BIO_MAX_VECS
  COMPAT  have_CRYPTO_TFM_NEED_KEY
  COMPAT  have_GENHD_FL_NO_PART
  COMPAT  have_SHASH_DESC_ON_STACK
  COMPAT  have_WB_congested_enum
  COMPAT  have_allow_kernel_signal
  COMPAT  have_bdev_discard_granularity
  COMPAT  have_bdev_max_discard_sectors
  COMPAT  have_bdev_nr_sectors
  COMPAT  have_bdevname
  COMPAT  have_bdgrab
  COMPAT  have_bdi_congested
  COMPAT  have_bdi_congested_fn
  COMPAT  have_bio_alloc_clone
  COMPAT  have_bio_bi_bdev
  COMPAT  have_bio_bi_error
  COMPAT  have_bio_bi_opf
  COMPAT  have_bio_bi_status
  COMPAT  have_bio_clone_fast
  COMPAT  have_bio_op_shift
  COMPAT  have_bio_set_dev
  COMPAT  have_bio_set_op_attrs
  COMPAT  have_bio_split_to_limits
  COMPAT  have_bio_start_io_acct
  COMPAT  have_bioset_init
  COMPAT  have_bioset_need_bvecs
  COMPAT  have_blk_alloc_disk
  COMPAT  have_blk_alloc_queue_rh
  COMPAT  have_blk_check_plugged
  COMPAT  have_blk_cleanup_disk
  COMPAT  have_blk_qc_t_make_request
  COMPAT  have_blk_qc_t_submit_bio
  COMPAT  have_blk_queue_flag_set
  COMPAT  have_blk_queue_make_request
  COMPAT  have_blk_queue_max_write_same_sectors
  COMPAT  have_blk_queue_merge_bvec
  COMPAT  have_blk_queue_split_bio
  COMPAT  have_blk_queue_split_q_bio
  COMPAT  have_blk_queue_split_q_bio_bioset
  COMPAT  have_blk_queue_update_readahead
  COMPAT  have_blk_queue_write_cache
  COMPAT  have_bvec_kmap_local
  COMPAT  have_d_inode
  COMPAT  have_disk_update_readahead
  COMPAT  have_fallthrough
  COMPAT  have_fs_dax_get_by_bdev
  COMPAT  have_generic_start_io_acct_q_rw_sect_part
  COMPAT  have_generic_start_io_acct_rw_sect_part
  COMPAT  have_get_random_u32
  COMPAT  have_get_random_u32_below
  COMPAT  have_hd_struct
  COMPAT  have_ib_cq_init_attr
  COMPAT  have_ib_get_dma_mr
  COMPAT  have_idr_is_empty
  COMPAT  have_inode_lock
  COMPAT  have_ktime_to_timespec64
  COMPAT  have_kvfree
  COMPAT  have_kvfree_rcu
  COMPAT  have_list_is_first
  COMPAT  have_list_next_entry
  COMPAT  have_max_send_recv_sge
  COMPAT  have_nla_nest_start_noflag
  COMPAT  have_nla_parse_deprecated
  COMPAT  have_nla_put_64bit
  COMPAT  have_nla_strscpy
  COMPAT  have_part_stat_h
  COMPAT  have_part_stat_read_accum
  COMPAT  have_pointer_backing_dev_info
  COMPAT  have_proc_create_single
  COMPAT  have_queue_flag_discard
  COMPAT  have_queue_flag_stable_writes
  COMPAT  have_rb_declare_callbacks_max
  COMPAT  have_refcount_inc
  COMPAT  have_req_hardbarrier
  COMPAT  have_req_noidle
  COMPAT  have_req_nounmap
  COMPAT  have_req_op_write
  COMPAT  have_req_op_write_zeroes
  COMPAT  have_req_write
  COMPAT  have_revalidate_disk_size
  COMPAT  have_sched_set_fifo
  COMPAT  have_sched_signal_h
  COMPAT  have_security_netlink_recv
  COMPAT  have_sendpage_ok
  COMPAT  have_set_capacity_and_notify
  COMPAT  have_shash_desc_zero
  COMPAT  have_simple_positive
  COMPAT  have_sock_set_keepalive
  COMPAT  have_strscpy
  COMPAT  have_struct_bvec_iter
  COMPAT  have_struct_size
  COMPAT  have_submit_bio_noacct
  COMPAT  have_tcp_sock_set_cork
  COMPAT  have_tcp_sock_set_keepcnt
  COMPAT  have_tcp_sock_set_keepidle
  COMPAT  have_tcp_sock_set_nodelay
  COMPAT  have_tcp_sock_set_quickack
  COMPAT  have_time64_to_tm
  COMPAT  have_timer_setup
  COMPAT  have_void_make_request
  COMPAT  have_void_submit_bio
  COMPAT  ib_alloc_pd_has_2_params
  COMPAT  ib_device_has_ops
  COMPAT  ib_post_send_const_params
  COMPAT  ib_query_device_has_3_params
  COMPAT  need_drbd_wrappers
  COMPAT  need_make_request_recursion
  COMPAT  need_skb_abort_seq_read
  COMPAT  part_stat_read_takes_block_device
  COMPAT  queue_limits_has_discard_zeroes_data
  COMPAT  rdma_create_id_has_net_ns
  COMPAT  rdma_reject_has_reason_arg
  COMPAT  sk_data_ready_has_1_param
  COMPAT  sock_create_kern_has_netns_parameter
  COMPAT  sock_ops_returns_addr_len
  COMPAT  struct_gendisk_has_backing_dev_info
  UPD     /tmp/pkg/drbd-9.2.3/drbd/compat.5.15.99.h
  UPD     /tmp/pkg/drbd-9.2.3/drbd/compat.h
make[4]: 'drbd-kernel-compat/cocci_cache/618a16740b5c8d49f5fd464218ac2850/compat.patch' is up to date.
  PATCH
patching file ./drbd_int.h
patching file drbd_transport_tcp.c
patching file drbd_state.c
patching file drbd_req.c
patching file drbd_receiver.c
patching file drbd_nl.c
patching file drbd_main.c
patching file drbd_debugfs.c
patching file drbd_dax_pmem.c
patching file drbd_bitmap.c
patching file drbd_actlog.c
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_dax_pmem.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_debugfs.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_bitmap.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_proc.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_sender.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_receiver.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_req.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_actlog.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_main.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_strings.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_nl.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_interval.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_state.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd-kernel-compat/drbd_wrappers.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_nla.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_transport.o
  GEN     /tmp/pkg/drbd-9.2.3/drbd/drbd_buildtag.c 
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_buildtag.o
  LD [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_tcp.o
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_rdma.o
  MODPOST /tmp/pkg/drbd-9.2.3/drbd/Module.symvers
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd.mod.o
  LD [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd.ko
  BTF [M] /tmp/pkg/drbd-9.2.3/drbd/drbd.ko
Skipping BTF generation for /tmp/pkg/drbd-9.2.3/drbd/drbd.ko due to unavailability of vmlinux
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_rdma.mod.o
  LD [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_rdma.ko
Skipping BTF generation for /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_rdma.ko due to unavailability of vmlinux
  BTF [M] /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_rdma.ko
  CC [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_tcp.mod.o
  LD [M]  /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_tcp.ko
  BTF [M] /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_tcp.ko
Skipping BTF generation for /tmp/pkg/drbd-9.2.3/drbd/drbd_transport_tcp.ko due to unavailability of vmlinux
mv .drbd_kernelrelease.new .drbd_kernelrelease
Memorizing module configuration ... done.
make[1]: Leaving directory '/tmp/pkg/drbd-9.2.3/drbd'

        Module build was successful.
=======================================================================
  With DRBD module version 8.4.5, we split out the management tools
  into their own repository at https://github.com/LINBIT/drbd-utils
  (tarball at http://links.linbit.com/drbd-download)

  That started out as "drbd-utils version 8.9.0",
  has a different release cycle,
  and provides compatible drbdadm, drbdsetup and drbdmeta tools
  for DRBD module versions 8.3, 8.4 and 9.

  Again: to manage DRBD 9 kernel modules and above,
  you want drbd-utils >= 9.3 from above url.
=======================================================================

DRBD version loaded:
version: 9.2.3 (api:2/proto:86-122)
GIT-hash: c142ca1280c41aee1330b980544ef276330ff6ef build by @node1, 2023-08-10 06:25:01
Transports (api:18): tcp (9.2.3) rdma (9.2.3)
root@node1:~# kubectl -n piraeus-datastore get pods 
NAME                                                    READY   STATUS    RESTARTS         AGE
ha-controller-hkws4                                     1/1     Running   1 (14m ago)      70m
ha-controller-nd5p2                                     1/1     Running   12 (7m20s ago)   21m
ha-controller-trbvh                                     1/1     Running   1 (13m ago)      70m
linstor-controller-97cd7495c-k6kzb                      1/1     Running   1 (14m ago)      70m
linstor-csi-controller-7f85967cd9-z7c56                 7/7     Running   8 (14m ago)      61m
linstor-csi-node-78hz4                                  3/3     Running   3 (14m ago)      70m
linstor-csi-node-9dx8d                                  3/3     Running   3 (13m ago)      70m
linstor-csi-node-wcdgp                                  3/3     Running   6 (15m ago)      70m
node1                                                   2/2     Running   0                2m48s
node2                                                   2/2     Running   2 (14m ago)      70m
node3                                                   2/2     Running   2 (13m ago)      70m
piraeus-datastore-controller-manager-6f6b8f48c4-lnzpp   2/2     Running   2 (13m ago)      70m
WanzenBug commented 1 year ago

There is a verison of DRBD that is included in your OS. But we need a newer version, which is what the drbd-module-loader container is for.

But if something during the boot sequence already loaded the old version of DRBD, we will not automatically unload it to build the newer version. Most of the time it is the drbd.service that is causing the load, but you may want to check /etc/modules.d/* for anything that wants to load the old DRBD version.

willzhang commented 1 year ago

thanks,i find i installed drbd-utils in node1

root@node1:~# apt list --installed |grep drbd

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

drbd-utils/jammy,now 9.15.0-1build2 amd64 [installed]

node2 and node3 have no drbd-utils and service

root@node2:~# systemctl status drbd
Unit drbd.service could not be found.

and i remove it

apt remove -y drbd-utils