piraeusdatastore / piraeus-operator

The Piraeus Operator manages LINSTOR clusters in Kubernetes.
https://piraeus.io/
Apache License 2.0
406 stars 63 forks source link

drbd-kernel-module-injector fails when using drbd9-centos7:v9.0.23 on CentOS 7.8 #53

Closed alexzhc closed 4 years ago

alexzhc commented 4 years ago

Compile fails when using drbd9-centos7:v9.0.23 on CentOS 7.8

# kubectl logs -f piraeus-op-ns-node-mf7zf -c drbd-kernel-module-injector
Need a git checkout to regenerate drbd/.drbd_git_revision
make[1]: Entering directory `/tmp/pkg/drbd-9.0.23-1/drbd'

    Calling toplevel makefile of kernel source tree, which I believe is in
    KDIR=/lib/modules/3.10.0-1127.10.1.el7.x86_64/build

make -C /lib/modules/3.10.0-1127.10.1.el7.x86_64/build   M=/tmp/pkg/drbd-9.0.23-1/drbd  modules
  COMPAT  before_4_13_kernel_read
  COMPAT  alloc_workqueue_takes_fmt
  COMPAT  blkdev_issue_zeroout_discard
  COMPAT  drbd_release_returns_void
  COMPAT  genl_policy_in_ops
  COMPAT  have_SHASH_DESC_ON_STACK
  COMPAT  have_WB_congested_enum
  COMPAT  have_allow_kernel_signal
  COMPAT  have_atomic_dec_if_positive_linux
  COMPAT  have_atomic_in_flight
  COMPAT  have_bd_claim_by_disk
  COMPAT  have_bd_unlink_disk_holder
  COMPAT  have_bio_bi_bdev
  COMPAT  have_bio_bi_error
  COMPAT  have_bio_bi_opf
  COMPAT  have_bio_bi_status
  COMPAT  have_bio_clone_fast
  COMPAT  have_bio_flush
  COMPAT  have_bio_free
  COMPAT  have_bio_op_shift
  COMPAT  have_bio_set_op_attrs
  COMPAT  have_bio_rw
  COMPAT  have_bioset_create_front_pad
  COMPAT  have_bioset_init
  COMPAT  have_bioset_need_bvecs
  COMPAT  have_blk_check_plugged
  COMPAT  have_blk_qc_t_make_request
  COMPAT  have_blk_queue_flag_set
  COMPAT  have_blk_queue_make_request
  COMPAT  have_blk_queue_merge_bvec
  COMPAT  have_blk_queue_split_q_bio
/bin/bash: /usr/bin/mkdir: Argument list too long
/bin/bash: /usr/bin/tr: Argument list too long
make[3]: execvp: /bin/bash: Argument list too long
make[3]: *** [/tmp/pkg/drbd-9.0.23-1/drbd/.compat_test.3.10.0-1127.10.1.el7.x86_64/have_blk_queue_split_q_bio_bioset.result] Error 127
make[3]: *** Waiting for unfinished jobs....
/bin/bash: /usr/bin/tr: Argument list too long
  COMPAT  have_blk_queue_plugged
/bin/bash: /usr/bin/tr: Argument list too long
/bin/bash: /usr/bin/tr: Argument list too long
make[2]: *** [_module_/tmp/pkg/drbd-9.0.23-1/drbd] Error 2
make[1]: *** [kbuild] Error 2
make[1]: Leaving directory `/tmp/pkg/drbd-9.0.23-1/drbd'
make: *** [module] Error 2

Could not find the expexted *.ko, see stderr for more details
alexzhc commented 4 years ago

However, docker run still works in the same environment.

 docker run -it --rm --privileged -v /lib/modules:/lib/modules:ro -v /usr/src:/usr/src:ro  daocloud.io/piraeus/drbd9-centos7:v9.0.23
Need a git checkout to regenerate drbd/.drbd_git_revision
make[1]: Entering directory `/tmp/pkg/drbd-9.0.23-1/drbd'

    Calling toplevel makefile of kernel source tree, which I believe is in
    KDIR=/lib/modules/3.10.0-1127.10.1.el7.x86_64/build

make -C /lib/modules/3.10.0-1127.10.1.el7.x86_64/build   M=/tmp/pkg/drbd-9.0.23-1/drbd  modules
  COMPAT  alloc_workqueue_takes_fmt
  COMPAT  before_4_13_kernel_read
  COMPAT  blkdev_issue_zeroout_discard
  COMPAT  drbd_release_returns_void
  COMPAT  genl_policy_in_ops
  COMPAT  have_WB_congested_enum
  COMPAT  have_SHASH_DESC_ON_STACK
  COMPAT  have_allow_kernel_signal
  COMPAT  have_atomic_dec_if_positive_linux
  COMPAT  have_atomic_in_flight
  COMPAT  have_bd_claim_by_disk
  COMPAT  have_bd_unlink_disk_holder
  COMPAT  have_bio_bi_bdev
  COMPAT  have_bio_bi_error
  COMPAT  have_bio_bi_opf
  COMPAT  have_bio_bi_status
  COMPAT  have_bio_clone_fast
  COMPAT  have_bio_flush
  COMPAT  have_bio_op_shift
  COMPAT  have_bio_rw
  COMPAT  have_bio_free
  COMPAT  have_bio_set_op_attrs
  COMPAT  have_bioset_create_front_pad
  COMPAT  have_bioset_init
  COMPAT  have_bioset_need_bvecs
  COMPAT  have_blk_check_plugged
  COMPAT  have_blk_qc_t_make_request
  COMPAT  have_blk_queue_flag_set
  COMPAT  have_blk_queue_make_request
  COMPAT  have_blk_queue_merge_bvec
  COMPAT  have_blk_queue_plugged
  COMPAT  have_blk_queue_split_q_bio
  COMPAT  have_blk_queue_write_cache
  COMPAT  have_blk_queue_split_q_bio_bioset
  COMPAT  have_blkdev_get_by_path
  COMPAT  have_d_inode
  COMPAT  have_file_inode
  COMPAT  have_generic_start_io_acct_q_rw_sect_part
  COMPAT  have_generic_start_io_acct_rw_sect_part
  COMPAT  have_genl_family_parallel_ops
  COMPAT  have_ib_cq_init_attr
  COMPAT  have_ib_get_dma_mr
  COMPAT  have_idr_alloc
  COMPAT  have_idr_is_empty
  COMPAT  have_inode_lock
  COMPAT  have_ktime_to_timespec64
  COMPAT  have_kvfree
  COMPAT  have_max_send_recv_sge
  COMPAT  have_netlink_cb_portid
  COMPAT  have_nla_nest_start_noflag
  COMPAT  have_nla_parse_deprecated
  COMPAT  have_nla_put_64bit
  COMPAT  have_pointer_backing_dev_info
  COMPAT  have_part_stat_h
  COMPAT  have_prandom_u32
  COMPAT  have_proc_create_single
  COMPAT  have_ratelimit_state_init
  COMPAT  have_rb_augment_functions
  COMPAT  have_refcount_inc
  COMPAT  have_req_hardbarrier
  COMPAT  have_req_noidle
  COMPAT  have_req_nounmap
  COMPAT  have_req_op_write
  COMPAT  have_req_op_write_same
  COMPAT  have_req_op_write_zeroes
  COMPAT  have_req_prio
  COMPAT  have_req_write
  COMPAT  have_req_write_same
  COMPAT  have_shash_desc_zero
  COMPAT  have_security_netlink_recv
  COMPAT  have_signed_nla_put
  COMPAT  have_simple_positive
  COMPAT  have_struct_bvec_iter
  COMPAT  have_struct_kernel_param_ops
  COMPAT  have_struct_size
  COMPAT  have_time64_to_tm
  COMPAT  have_timer_setup
  COMPAT  have_void_make_request
  COMPAT  hlist_for_each_entry_has_three_parameters
  COMPAT  ib_alloc_pd_has_2_params
  COMPAT  ib_device_has_ops
  COMPAT  ib_post_send_const_params
  COMPAT  ib_query_device_has_3_params
  COMPAT  kmap_atomic_page_only
  COMPAT  need_make_request_recursion
  COMPAT  queue_limits_has_discard_zeroes_data
  COMPAT  rdma_create_id_has_net_ns
  COMPAT  sock_create_kern_has_five_parameters
  COMPAT  sock_ops_returns_addr_len
  CHK     /tmp/pkg/drbd-9.0.23-1/drbd/compat.3.10.0-1127.10.1.el7.x86_64.h
  UPD     /tmp/pkg/drbd-9.0.23-1/drbd/compat.3.10.0-1127.10.1.el7.x86_64.h
  CHK     /tmp/pkg/drbd-9.0.23-1/drbd/compat.h
  UPD     /tmp/pkg/drbd-9.0.23-1/drbd/compat.h
make[4]: `drbd-kernel-compat/cocci_cache/f40245fde8bb98acdde06d9e38d717b0/compat.patch' is up to date.
  PATCH
patching file ./drbd_int.h
patching file drbd_receiver.c
patching file drbd_main.c
patching file drbd_nla.c
patching file drbd_nl.c
patching file drbd_bitmap.c
patching file drbd_transport_tcp.c
patching file drbd_actlog.c
patching file kref_debug.c
patching file drbd_req.c
patching file drbd_sender.c
patching file drbd_debugfs.c
patching file drbd-headers/linux/genl_magic_func.h
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_dax_pmem.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_debugfs.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_bitmap.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_proc.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_sender.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_receiver.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_req.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_actlog.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/lru_cache.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_main.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_strings.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_nl.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_interval.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_state.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd-kernel-compat/drbd_wrappers.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_nla.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_transport.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/kref_debug.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_kref_debug.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_transport_tcp.o
  GEN     /tmp/pkg/drbd-9.0.23-1/drbd/drbd_buildtag.c
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_buildtag.o
  LD [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd.o
  Building modules, stage 2.
  MODPOST 2 modules
  CC      /tmp/pkg/drbd-9.0.23-1/drbd/drbd.mod.o
  CC      /tmp/pkg/drbd-9.0.23-1/drbd/drbd_transport_tcp.mod.o
  LD [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd.ko
  LD [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_transport_tcp.ko
mv .drbd_kernelrelease.new .drbd_kernelrelease
Memorizing module configuration ... done.
make[1]: Leaving directory `/tmp/pkg/drbd-9.0.23-1/drbd'

        Module build was successful.
=======================================================================
  With DRBD module version 8.4.5, we split out the management tools
  into their own repository at https://github.com/LINBIT/drbd-utils
  (tarball at http://links.linbit.com/drbd-download)

  That started out as "drbd-utils version 8.9.0",
  has a different release cycle,
  and provides compatible drbdadm, drbdsetup and drbdmeta tools
  for DRBD module versions 8.3, 8.4 and 9.

  Again: to manage DRBD 9 kernel modules and above,
  you want drbd-utils >= 9.3 from above url.
=======================================================================

DRBD version loaded:
version: 9.0.23-1 (api:2/proto:86-116)
GIT-hash: d16bfab7a4033024fed2d99d3b179aa6bb6eb300 build by @5c911893f9f7, 2020-07-10 10:32:52
Transports (api:16): tcp (9.0.23-1)
WanzenBug commented 4 years ago

I've tried recreating the issue, but on my setup its working as expected:

Need a git checkout to regenerate drbd/.drbd_git_revision
make[1]: Entering directory `/tmp/pkg/drbd-9.0.23-1/drbd'
    Calling toplevel makefile of kernel source tree, which I believe is in
    KDIR=/lib/modules/3.10.0-1127.10.1.el7.x86_64/build
make -C /lib/modules/3.10.0-1127.10.1.el7.x86_64/build   M=/tmp/pkg/drbd-9.0.23-1/drbd  modules
  COMPAT  before_4_13_kernel_read
  COMPAT  alloc_workqueue_takes_fmt
  COMPAT  blkdev_issue_zeroout_discard
  COMPAT  drbd_release_returns_void
  COMPAT  have_SHASH_DESC_ON_STACK
  COMPAT  genl_policy_in_ops
  COMPAT  have_allow_kernel_signal
  COMPAT  have_WB_congested_enum
  COMPAT  have_atomic_dec_if_positive_linux
  COMPAT  have_bd_unlink_disk_holder
  COMPAT  have_bd_claim_by_disk
  COMPAT  have_atomic_in_flight
  COMPAT  have_bio_bi_opf
  COMPAT  have_bio_bi_error
  COMPAT  have_bio_bi_bdev
  COMPAT  have_bio_bi_status
  COMPAT  have_bio_clone_fast
  COMPAT  have_bio_flush
  COMPAT  have_bio_free
  COMPAT  have_bio_op_shift
  COMPAT  have_bio_rw
  COMPAT  have_bio_set_op_attrs
  COMPAT  have_bioset_create_front_pad
  COMPAT  have_bioset_init
  COMPAT  have_bioset_need_bvecs
  COMPAT  have_blk_check_plugged
  COMPAT  have_blk_qc_t_make_request
  COMPAT  have_blk_queue_flag_set
  COMPAT  have_blk_queue_make_request
  COMPAT  have_blk_queue_merge_bvec
  COMPAT  have_blk_queue_plugged
  COMPAT  have_blk_queue_split_q_bio
  COMPAT  have_blk_queue_split_q_bio_bioset
  COMPAT  have_blk_queue_write_cache
  COMPAT  have_blkdev_get_by_path
  COMPAT  have_d_inode
  COMPAT  have_file_inode
  COMPAT  have_generic_start_io_acct_rw_sect_part
  COMPAT  have_genl_family_parallel_ops
  COMPAT  have_generic_start_io_acct_q_rw_sect_part
  COMPAT  have_ib_cq_init_attr
  COMPAT  have_ib_get_dma_mr
  COMPAT  have_idr_alloc
  COMPAT  have_idr_is_empty
  COMPAT  have_inode_lock
  COMPAT  have_kvfree
  COMPAT  have_max_send_recv_sge
  COMPAT  have_netlink_cb_portid
  COMPAT  have_nla_nest_start_noflag
  COMPAT  have_nla_parse_deprecated
  COMPAT  have_nla_put_64bit
  COMPAT  have_pointer_backing_dev_info
  COMPAT  have_ktime_to_timespec64
  COMPAT  have_part_stat_h
  COMPAT  have_prandom_u32
  COMPAT  have_proc_create_single
  COMPAT  have_ratelimit_state_init
  COMPAT  have_rb_augment_functions
  COMPAT  have_refcount_inc
  COMPAT  have_req_noidle
  COMPAT  have_req_nounmap
  COMPAT  have_req_hardbarrier
  COMPAT  have_req_op_write
  COMPAT  have_req_op_write_same
  COMPAT  have_req_op_write_zeroes
  COMPAT  have_req_prio
  COMPAT  have_req_write
  COMPAT  have_req_write_same
  COMPAT  have_security_netlink_recv
  COMPAT  have_shash_desc_zero
  COMPAT  have_simple_positive
  COMPAT  have_struct_bvec_iter
  COMPAT  have_signed_nla_put
  COMPAT  have_struct_kernel_param_ops
  COMPAT  have_struct_size
  COMPAT  have_time64_to_tm
  COMPAT  have_timer_setup
  COMPAT  have_void_make_request
  COMPAT  hlist_for_each_entry_has_three_parameters
  COMPAT  ib_alloc_pd_has_2_params
  COMPAT  ib_device_has_ops
  COMPAT  ib_post_send_const_params
  COMPAT  ib_query_device_has_3_params
  COMPAT  kmap_atomic_page_only
  COMPAT  need_make_request_recursion
  COMPAT  queue_limits_has_discard_zeroes_data
  COMPAT  rdma_create_id_has_net_ns
  COMPAT  sock_create_kern_has_five_parameters
  COMPAT  sock_ops_returns_addr_len
  CHK     /tmp/pkg/drbd-9.0.23-1/drbd/compat.3.10.0-1127.10.1.el7.x86_64.h
  UPD     /tmp/pkg/drbd-9.0.23-1/drbd/compat.3.10.0-1127.10.1.el7.x86_64.h
  CHK     /tmp/pkg/drbd-9.0.23-1/drbd/compat.h
  UPD     /tmp/pkg/drbd-9.0.23-1/drbd/compat.h
make[4]: `drbd-kernel-compat/cocci_cache/f40245fde8bb98acdde06d9e38d717b0/compat.patch' is up to date.
  PATCH
patching file ./drbd_int.h
patching file drbd_receiver.c
patching file drbd_main.c
patching file drbd_nla.c
patching file drbd_nl.c
patching file drbd_bitmap.c
patching file drbd_transport_tcp.c
patching file drbd_actlog.c
patching file kref_debug.c
patching file drbd_req.c
patching file drbd_sender.c
patching file drbd_debugfs.c
patching file drbd-headers/linux/genl_magic_func.h
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_dax_pmem.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_debugfs.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_bitmap.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_proc.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_sender.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_receiver.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_req.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_actlog.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/lru_cache.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_main.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_strings.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_nl.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_interval.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_state.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd-kernel-compat/drbd_wrappers.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_nla.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_transport.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/kref_debug.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_kref_debug.o
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_transport_tcp.o
  GEN     /tmp/pkg/drbd-9.0.23-1/drbd/drbd_buildtag.c 
  CC [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_buildtag.o
  LD [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd.o
  Building modules, stage 2.
  MODPOST 2 modules
  CC      /tmp/pkg/drbd-9.0.23-1/drbd/drbd.mod.o
  CC      /tmp/pkg/drbd-9.0.23-1/drbd/drbd_transport_tcp.mod.o
  LD [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd_transport_tcp.ko
  LD [M]  /tmp/pkg/drbd-9.0.23-1/drbd/drbd.ko
mv .drbd_kernelrelease.new .drbd_kernelrelease
Memorizing module configuration ... done.
make[1]: Leaving directory `/tmp/pkg/drbd-9.0.23-1/drbd'
    Module build was successful.
=======================================================================
  With DRBD module version 8.4.5, we split out the management tools
  into their own repository at https://github.com/LINBIT/drbd-utils
  (tarball at http://links.linbit.com/drbd-download)
  That started out as "drbd-utils version 8.9.0",
  has a different release cycle,
  and provides compatible drbdadm, drbdsetup and drbdmeta tools
  for DRBD module versions 8.3, 8.4 and 9.
  Again: to manage DRBD 9 kernel modules and above,
  you want drbd-utils >= 9.3 from above url.
=======================================================================
DRBD version loaded:
version: 9.0.23-1 (api:2/proto:86-116)
GIT-hash: d16bfab7a4033024fed2d99d3b179aa6bb6eb300 build by @centos-7-k8s-101.test, 2020-07-10 12:39:33
Transports (api:16): tcp (9.0.23-1)

Is there anything you can think of that might be "different" on your cluster? is it using a different container engine? Is using some extra long hostnames? :shrug:

alexzhc commented 4 years ago

We are using kubelet 1.18.0 and docker 19.3.8. Could you try those versions as well?

alexzhc commented 4 years ago

@WanzenBug would you like to consider using docker run docker technique for drbd-kernel-module-injector? It should avoid troubleshooting such an issue from now and then, and also allows switching between centos and ubuntu images automatically. I implemented here https://github.com/piraeusdatastore/piraeus/blob/master/dockerfiles/piraeus-init/bin/init-node.sh So far, it works fine each time.

alexzhc commented 4 years ago
# kubectl get pod piraeus-op-ns-node-zx2fz -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu, memory request for container
      linstor-satellite; cpu, memory limit for container linstor-satellite; cpu, memory
      request for init container drbd-kernel-module-injector; cpu, memory limit for
      init container drbd-kernel-module-injector'
    kubernetes.io/psp: dce-psp-allow-all
  creationTimestamp: "2020-07-13T08:42:56Z"
  generateName: piraeus-op-ns-node-
  labels:
    app: piraeus-op-ns
    controller-revision-hash: 587747f9d9
    pod-template-generation: "1"
    role: piraeus-node
  name: piraeus-op-ns-node-zx2fz
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: piraeus-op-ns-node
    uid: 4dc05c60-6886-4227-909f-3516b9fc51fc
  resourceVersion: "217873"
  selfLink: /api/v1/namespaces/default/pods/piraeus-op-ns-node-zx2fz
  uid: 4a10037a-18eb-46db-b132-faa4dd623366
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchFields:
          - key: metadata.name
            operator: In
            values:
            - k8s-worker-1
  containers:
  - args:
    - startSatellite
    image: daocloud.io/piraeus/piraeus-server:v1.7.1
    imagePullPolicy: IfNotPresent
    name: linstor-satellite
    ports:
    - containerPort: 3366
      hostPort: 3366
      protocol: TCP
    readinessProbe:
      failureThreshold: 10
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      tcpSocket:
        port: 3366
      timeoutSeconds: 5
    resources:
      limits:
        cpu: 128m
        memory: "268435456"
      requests:
        cpu: 64m
        memory: "268435456"
    securityContext:
      privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/linstor
      name: linstor-conf
    - mountPath: /dev/
      name: device-dir
    - mountPath: /sys/
      name: sys-dir
    - mountPath: /lib/modules/
      mountPropagation: Bidirectional
      name: modules-dir
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-v6mlw
      readOnly: true
  dnsPolicy: ClusterFirstWithHostNet
  enableServiceLinks: true
  hostNetwork: true
  imagePullSecrets:
  - name: drbdiocred
  initContainers:
  - env:
    - name: LB_HOW
      value: compile
    image: daocloud.io/piraeus/drbd9-centos7:v9.0.24
    imagePullPolicy: IfNotPresent
    name: drbd-kernel-module-injector
    resources:
      limits:
        cpu: 128m
        memory: "268435456"
      requests:
        cpu: 64m
        memory: "268435456"
    securityContext:
      privileged: true
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /usr/src
      name: src-dir
      readOnly: true
    - mountPath: /lib/modules/
      name: modules-dir
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-v6mlw
      readOnly: true
  nodeName: k8s-worker-1
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/disk-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/pid-pressure
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/unschedulable
    operator: Exists
  - effect: NoSchedule
    key: node.kubernetes.io/network-unavailable
    operator: Exists
  volumes:
  - configMap:
      defaultMode: 420
      name: piraeus-op-ns-config
    name: linstor-conf
  - hostPath:
      path: /dev/
      type: ""
    name: device-dir
  - hostPath:
      path: /sys/
      type: Directory
    name: sys-dir
  - hostPath:
      path: /lib/modules/
      type: DirectoryOrCreate
    name: modules-dir
  - hostPath:
      path: /usr/src
      type: Directory
    name: src-dir
  - name: default-token-v6mlw
    secret:
      defaultMode: 420
      secretName: default-token-v6mlw
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-07-13T08:42:56Z"
    message: 'containers with incomplete status: [drbd-kernel-module-injector]'
    reason: ContainersNotInitialized
    status: "False"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-07-13T08:42:56Z"
    message: 'containers with unready status: [linstor-satellite]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-07-13T08:42:56Z"
    message: 'containers with unready status: [linstor-satellite]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-07-13T08:42:56Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: daocloud.io/piraeus/piraeus-server:v1.7.1
    imageID: ""
    lastState: {}
    name: linstor-satellite
    ready: false
    restartCount: 0
    started: false
    state:
      waiting:
        reason: PodInitializing
  hostIP: 192.168.176.191
  initContainerStatuses:
  - containerID: docker://0036729dd7407258ecccf07d398425e29a39c9bcec54771aeb28b2e8fe11d5e1
    image: daocloud.io/piraeus/drbd9-centos7:v9.0.24
    imageID: docker-pullable://daocloud.io/piraeus/drbd9-centos7@sha256:d9acaa6f3db4ac619b38c1f8975d991f974b15cd9cf3ad676ab0dde823b62a51
    lastState: {}
    name: drbd-kernel-module-injector
    ready: false
    restartCount: 0
    state:
      running:
        startedAt: "2020-07-13T08:42:58Z"
  phase: Pending
  podIP: 192.168.176.191
  podIPs:
  - ip: 192.168.176.191
  qosClass: Burstable
  startTime: "2020-07-13T08:42:56Z"
alexzhc commented 4 years ago

docker inspect


[
    {
        "Id": "0036729dd7407258ecccf07d398425e29a39c9bcec54771aeb28b2e8fe11d5e1",
        "Created": "2020-07-13T08:42:57.541909695Z",
        "Path": "/bin/sh",
        "Args": [
            "-c",
            "/entry.sh"
        ],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 50399,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2020-07-13T08:42:58.203137184Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:3b4d3a6d45e2ddcaf723e9b9367bf8051da993b49dd2e34f2c07394662e07640",
        "ResolvConfPath": "/var/lib/containers/docker/containers/0b180bf3bf7f877d3b0795dbeeeb52c9b2e85a37b915cf00c9dc8263ef6961bb/resolv.conf",
        "HostnamePath": "/var/lib/containers/docker/containers/0b180bf3bf7f877d3b0795dbeeeb52c9b2e85a37b915cf00c9dc8263ef6961bb/hostname",
        "HostsPath": "/var/lib/kubelet/pods/4a10037a-18eb-46db-b132-faa4dd623366/etc-hosts",
        "LogPath": "/var/lib/containers/docker/containers/0036729dd7407258ecccf07d398425e29a39c9bcec54771aeb28b2e8fe11d5e1/0036729dd7407258ecccf07d398425e29a39c9bcec54771aeb28b2e8fe11d5e1-json.log",
        "Name": "/k8s_drbd-kernel-module-injector_piraeus-op-ns-node-zx2fz_default_4a10037a-18eb-46db-b132-faa4dd623366_0",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "/usr/src:/usr/src:ro",
                "/lib/modules/:/lib/modules/",
                "/var/lib/kubelet/pods/4a10037a-18eb-46db-b132-faa4dd623366/volumes/kubernetes.io~secret/default-token-v6mlw:/var/run/secrets/kubernetes.io/serviceaccount:ro",
                "/var/lib/kubelet/pods/4a10037a-18eb-46db-b132-faa4dd623366/etc-hosts:/etc/hosts",
                "/var/lib/kubelet/pods/4a10037a-18eb-46db-b132-faa4dd623366/containers/drbd-kernel-module-injector/e47c5956:/dev/termination-log"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {
                    "max-file": "3",
                    "max-size": "100m"
                }
            },
            "NetworkMode": "container:0b180bf3bf7f877d3b0795dbeeeb52c9b2e85a37b915cf00c9dc8263ef6961bb",
            "PortBindings": null,
            "RestartPolicy": {
                "Name": "no",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Capabilities": null,
            "Dns": null,
            "DnsOptions": null,
            "DnsSearch": null,
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "container:0b180bf3bf7f877d3b0795dbeeeb52c9b2e85a37b915cf00c9dc8263ef6961bb",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 993,
            "PidMode": "",
            "Privileged": true,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": [
                "seccomp=unconfined",
                "label=disable"
            ],
            "UTSMode": "host",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 65,
            "Memory": 268435456,
            "NanoCpus": 0,
            "CgroupParent": "/kubepods/burstable/pod4a10037a-18eb-46db-b132-faa4dd623366",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 100000,
            "CpuQuota": 12800,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DeviceCgroupRules": null,
            "DeviceRequests": null,
            "KernelMemory": 0,
            "KernelMemoryTCP": 0,
            "MemoryReservation": 0,
            "MemorySwap": 268435456,
            "MemorySwappiness": null,
            "OomKillDisable": false,
            "PidsLimit": null,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "MaskedPaths": null,
            "ReadonlyPaths": null
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/containers/docker/overlay2/808ed7827f797163bff5959275e31980b4e07afb543921668eb278c6b5234a4e-init/diff:/var/lib/containers/docker/overlay2/02f5fd59d05e68bae1b48b8b623eed451b9d4811c1ebf60c59762c721ca13077/diff:/var/lib/containers/docker/overlay2/fd846522960f1a88c94679609cc61b366fa60ceb1c2fb5d9cddc2b8772a82b06/diff:/var/lib/containers/docker/overlay2/8c3df6299d323e7e348eec6b65c0a21e18ee037da17f85aa080736991a742ad5/diff:/var/lib/containers/docker/overlay2/fee7e0ac3c2464d388eb37afa40130eea8e30a3e62d1181dc9d41bddad30e21b/diff:/var/lib/containers/docker/overlay2/52a3c3d5d1ecc989173677912012e6f850ae8520f55cd3c9008833c69cff7c7a/diff",
                "MergedDir": "/var/lib/containers/docker/overlay2/808ed7827f797163bff5959275e31980b4e07afb543921668eb278c6b5234a4e/merged",
                "UpperDir": "/var/lib/containers/docker/overlay2/808ed7827f797163bff5959275e31980b4e07afb543921668eb278c6b5234a4e/diff",
                "WorkDir": "/var/lib/containers/docker/overlay2/808ed7827f797163bff5959275e31980b4e07afb543921668eb278c6b5234a4e/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/var/lib/kubelet/pods/4a10037a-18eb-46db-b132-faa4dd623366/volumes/kubernetes.io~secret/default-token-v6mlw",
                "Destination": "/var/run/secrets/kubernetes.io/serviceaccount",
                "Mode": "ro",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/var/lib/kubelet/pods/4a10037a-18eb-46db-b132-faa4dd623366/etc-hosts",
                "Destination": "/etc/hosts",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/var/lib/kubelet/pods/4a10037a-18eb-46db-b132-faa4dd623366/containers/drbd-kernel-module-injector/e47c5956",
                "Destination": "/dev/termination-log",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/usr/src",
                "Destination": "/usr/src",
                "Mode": "ro",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/lib/modules",
                "Destination": "/lib/modules",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
        "Config": {
            "Hostname": "k8s-worker-1",
            "Domainname": "",
            "User": "0",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "LB_HOW=compile",
                "PIRAEUS_OPERATOR_METRICS_SERVICE_PORT=8383",
                "PIRAEUS_OPERATOR_METRICS_PORT_8383_TCP=tcp://172.31.231.61:8383",
                "PIRAEUS_OPERATOR_METRICS_PORT_8383_TCP_PROTO=tcp",
                "PIRAEUS_OPERATOR_METRICS_PORT_8383_TCP_ADDR=172.31.231.61",
                "PIRAEUS_OP_STORK_SERVICE_SERVICE_HOST=172.31.209.176",
                "KUBERNETES_PORT_443_TCP_PORT=443",
                "KUBERNETES_PORT_443_TCP_ADDR=172.31.0.1",
                "KUBERNETES_PORT_443_TCP=tcp://172.31.0.1:443",
                "PIRAEUS_OPERATOR_METRICS_PORT=tcp://172.31.231.61:8383",
                "PIRAEUS_OPERATOR_METRICS_SERVICE_PORT_HTTP_METRICS=8383",
                "PIRAEUS_OPERATOR_METRICS_SERVICE_PORT_CR_METRICS=8686",
                "PIRAEUS_OP_STORK_SERVICE_PORT_8099_TCP_ADDR=172.31.209.176",
                "PIRAEUS_OP_STORK_SERVICE_PORT_443_TCP=tcp://172.31.209.176:443",
                "PIRAEUS_OP_STORK_SERVICE_PORT_443_TCP_PROTO=tcp",
                "KUBERNETES_SERVICE_PORT_HTTPS=443",
                "PIRAEUS_OPERATOR_METRICS_PORT_8686_TCP=tcp://172.31.231.61:8686",
                "PIRAEUS_OPERATOR_METRICS_PORT_8686_TCP_ADDR=172.31.231.61",
                "PIRAEUS_OP_STORK_SERVICE_SERVICE_PORT_EXTENDER=8099",
                "PIRAEUS_OP_STORK_SERVICE_PORT_8099_TCP_PROTO=tcp",
                "PIRAEUS_OP_STORK_SERVICE_PORT_443_TCP_ADDR=172.31.209.176",
                "KUBERNETES_SERVICE_HOST=172.31.0.1",
                "KUBERNETES_SERVICE_PORT=443",
                "PIRAEUS_OPERATOR_METRICS_PORT_8686_TCP_PROTO=tcp",
                "PIRAEUS_OP_STORK_SERVICE_SERVICE_PORT=8099",
                "KUBERNETES_PORT=tcp://172.31.0.1:443",
                "PIRAEUS_OPERATOR_METRICS_SERVICE_HOST=172.31.231.61",
                "PIRAEUS_OP_STORK_SERVICE_PORT_8099_TCP=tcp://172.31.209.176:8099",
                "PIRAEUS_OP_STORK_SERVICE_PORT_443_TCP_PORT=443",
                "PIRAEUS_OP_STORK_SERVICE_SERVICE_PORT_WEBHOOK=443",
                "PIRAEUS_OP_STORK_SERVICE_PORT=tcp://172.31.209.176:8099",
                "PIRAEUS_OPERATOR_METRICS_PORT_8686_TCP_PORT=8686",
                "PIRAEUS_OP_STORK_SERVICE_PORT_8099_TCP_PORT=8099",
                "KUBERNETES_PORT_443_TCP_PROTO=tcp",
                "PIRAEUS_OPERATOR_METRICS_PORT_8383_TCP_PORT=8383",
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "DRBD_VERSION=9.0.24-1"
            ],
            "Cmd": null,
            "Healthcheck": {
                "Test": [
                    "NONE"
                ]
            },
            "Image": "sha256:3b4d3a6d45e2ddcaf723e9b9367bf8051da993b49dd2e34f2c07394662e07640",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": [
                "/bin/sh",
                "-c",
                "/entry.sh"
            ],
            "OnBuild": null,
            "Labels": {
                "annotation.io.kubernetes.container.hash": "7eeb964f",
                "annotation.io.kubernetes.container.restartCount": "0",
                "annotation.io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
                "annotation.io.kubernetes.container.terminationMessagePolicy": "File",
                "annotation.io.kubernetes.pod.terminationGracePeriod": "30",
                "io.kubernetes.container.logpath": "/var/log/pods/default_piraeus-op-ns-node-zx2fz_4a10037a-18eb-46db-b132-faa4dd623366/drbd-kernel-module-injector/0.log",
                "io.kubernetes.container.name": "drbd-kernel-module-injector",
                "io.kubernetes.docker.type": "container",
                "io.kubernetes.pod.name": "piraeus-op-ns-node-zx2fz",
                "io.kubernetes.pod.namespace": "default",
                "io.kubernetes.pod.uid": "4a10037a-18eb-46db-b132-faa4dd623366",
                "io.kubernetes.sandbox.id": "0b180bf3bf7f877d3b0795dbeeeb52c9b2e85a37b915cf00c9dc8263ef6961bb",
                "org.label-schema.build-date": "20200504",
                "org.label-schema.license": "GPLv2",
                "org.label-schema.name": "CentOS Base Image",
                "org.label-schema.schema-version": "1.0",
                "org.label-schema.vendor": "CentOS",
                "org.opencontainers.image.created": "2020-05-04 00:00:00+01:00",
                "org.opencontainers.image.licenses": "GPL-2.0-only",
                "org.opencontainers.image.title": "CentOS Base Image",
                "org.opencontainers.image.vendor": "CentOS"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {}
        }
    }
]
WanzenBug commented 4 years ago

@alexzhc is that a pod where the error happened? its using a newer image version than the original (v9.0.24 vs v9.0.23-1)

WanzenBug commented 4 years ago

I tried re-creating the issue without much luck. I have not found a combination of versions and settings that leads to the injector crashing. It may be related to this issue: https://github.com/kubernetes/kubernetes/issues/84539

@alexzhc if you have some time and can reliably reproduce the error, could you try setting enableServiceLinks: false on the daemonset? It might fix the issue and we are not relying on service discovery via environment variable anyways

alexzhc commented 4 years ago

Sure, I'll try. The original error message is /bin/bash: /usr/bin/mkdir: Argument list too long; /bin/bash: /usr/bin/tr: Argument list too long. How to enable bash -x in the build script, so that we can see the actually mkdir xxx arguments?

JoelColledge commented 4 years ago

This is a very strange error indeed.

How to enable bash -x in the build script, so that we can see the actually mkdir xxx arguments?

Since the error occurs in the module build itself and not in any wrapper script, we need to switch on the kernel's verbose build option. In /entry.sh in the container, there is this make call. Change that from make -j to make -j V=1 and you should see the commands that are being executed. This output would be very interesting.

You will know when this is working because you will not find lines like COMPAT have_blk_queue_split_q_bio_bioset in the log, but rather long commands like:

  mkdir -p /drbd/drbd/.compat_test.4.15.18/ ; var=`echo COMPAT_have_blk_queue_split_q_bio_bioset | tr -- -a-z _A-Z | tr -dc A-Z0-9_` ; if gcc -Wp,-MD,/drbd/drbd/.compat_test.4.15.18/.have_blk_queue_split_q_bio_bioset.result.d  -nostdinc -isystem /usr/lib/gcc/x86_64-linux-gnu/7/include -I/drbd/drbd -I/drbd/drbd/drbd-headers  -I./arch/x86/include -I./arch/x86/include/generated  -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -Iubuntu/include  -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -fno-PIE -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -DCONFIG_X86_X32_ABI -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_AVX512=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mindirect-branch=thunk-extern -mindirect-branch-register -fno-jump-tables -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation -Wno-format-overflow -Wno-int-in-bool-context -O2 --param=allow-store-data-races=0 -DCC_HAVE_ASM_GOTO -Wframe-larger-than=1024 -fstack-protector-strong -Wno-unused-but-set-variable -Wno-unused-const-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-var-tracking-assignments -pg -mfentry -DCC_USING_FENTRY -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fno-stack-check -fconserve-stack -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init -I/drbd/drbd -I/drbd/drbd/drbd-kernel-compat -DIDR_GET_NEXT_EXPORTED -DBLKDEV_ISSUE_ZEROOUT_EXPORTED -DCONFIG_KREF_DEBUG  -DMODULE  -DKBUILD_BASENAME='"have_blk_queue_split_q_bio_bioset"'  -Werror-implicit-function-declaration  -c -o /drbd/drbd/.compat_test.4.15.18/.have_blk_queue_split_q_bio_bioset.o /drbd/drbd/drbd-kernel-compat/tests/have_blk_queue_split_q_bio_bioset.c > /drbd/drbd/.compat_test.4.15.18/have_blk_queue_split_q_bio_bioset.stdout 2> /drbd/drbd/.compat_test.4.15.18/have_blk_queue_split_q_bio_bioset.stderr -D"KBUILD_MODNAME=\"compat_dummy\"" ; then echo "#define $var" ; else echo "/* #undef $var */" ; fi > /drbd/drbd/.compat_test.4.15.18/have_blk_queue_split_q_bio_bioset.result
alexzhc commented 4 years ago

I tried enableServiceLinks: false. The problem persists.

alexzhc commented 4 years ago

Here is the verbose output of the compilation error: injector.error.verbose.txt

alexzhc commented 4 years ago

I just find that it actually works on kube 1.15.3, but fails on kube 1.18.0. @WanzenBug could you try kube 1.18.0?

WanzenBug commented 4 years ago

I managed to make some progress: I can recreate the bug using:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:58:59Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:50:46Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
$ uname -a
Linux centos-7-k8s-200.test 3.10.0-1127.13.1.el7.x86_64 #1 SMP Tue Jun 23 15:46:38 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  COMPAT  have_bio_start_io_acct
  COMPAT  have_bioset_create_front_pad
make[3]: execvp: /bin/bash: Argument list too long
make[3]: *** [/tmp/pkg/drbd-9.0.24-1/drbd/.compat_test.3.10.0-1127.13.1.el7.x86_64/have_bioset_init.result] Error 127
make[3]: *** Waiting for unfinished jobs....          
/bin/bash: line 0: echo: write error: Cannot allocate memory
make[3]: *** [/tmp/pkg/drbd-9.0.24-1/drbd/.compat_test.3.10.0-1127.13.1.el7.x86_64/have_bio_free.result] Error 1
make[2]: *** [_module_/tmp/pkg/drbd-9.0.24-1/drbd] Error 2           
make[1]: *** [kbuild] Error 2  
make[1]: Leaving directory `/tmp/pkg/drbd-9.0.24-1/drbd'              
make: *** [module] Error 2                                   
WanzenBug commented 4 years ago

The problem goes away when using make V=1 instead of make -j V=1.

WanzenBug commented 4 years ago

I don't know how I never noticed this, but:

  initContainers:
  - env:
    - name: LB_HOW
      value: compile
    image: daocloud.io/piraeus/drbd9-centos7:v9.0.24
    imagePullPolicy: IfNotPresent
    name: drbd-kernel-module-injector
    resources:
      limits:
        cpu: 128m
        memory: "268435456"
      requests:
        cpu: 64m
        memory: "268435456"

the memory limits are too low. Using make -j we start multiple cc instances at the same time, they can easily allocate more than 256MB. The error message Argument list too long is probably a side-effect of not being able to allocate the environment or argument memory region.

I tried a limit of 1GiB, and it seemed to fix the problem. Looking at the code, it seems I missed setting the resources for the init container in #57.

@rck I think make -j is a dangerous default. We shouldn't let make use unlimited parallelism.

rck commented 4 years ago

quite frankly I don't know why I gave it just a -j. Probably I thought it is the responsibility of the user, and "good luck". But I'm easily convinced that it is not a good default.

What about a default of make, and a LB_MAKEOPTS?

WanzenBug commented 4 years ago

LB_MAKEOPTS sounds like a clean solution, so I am in favour :+1:

rck commented 4 years ago

ACK. will prepare such a thing

alexzhc commented 4 years ago

@WanzenBug good catch. I did not notice my k8s-based platform automatically sets a resource LimitRange on each namespace created, including default. Pods with no resource specified will use the default setting in the LimitRange. I guess this is another case in which the chart should allow customizing pod resources including initContainers.

kubectl get limitrange dce-default-limit-range -o yaml
apiVersion: v1
kind: LimitRange
metadata:
  creationTimestamp: "2020-06-12T03:55:10Z"
  name: dce-default-limit-range
  namespace: default
  resourceVersion: "1644"
  selfLink: /api/v1/namespaces/default/limitranges/dce-default-limit-range
  uid: e8bf74f6-0dac-4baf-8a4b-36adc41e24ac
spec:
  limits:
  - default:
      cpu: 128m
      memory: "268435456"
    defaultRequest:
      cpu: 64m
      memory: "268435456"
    maxLimitRequestRatio:
      cpu: "4"
      memory: "1"
    type: Container
alexzhc commented 4 years ago

After deleting LimitRange1, now drbd-kernel-module-injector works. @WanzenBug please also make initContainer resource tunable in value.yaml.

rck commented 4 years ago

setting make flags via LB_MAKEOPTS is now public. As the Dockerfiles fetch this file, feel free to rebuild the current containers if necessary.

https://github.com/LINBIT/drbd/blob/drbd-9.0/docker/entry.sh#L125