Closed alexzhc closed 4 years ago
However, docker run
still works in the same environment.
docker run -it --rm --privileged -v /lib/modules:/lib/modules:ro -v /usr/src:/usr/src:ro daocloud.io/piraeus/drbd9-centos7:v9.0.23
Need a git checkout to regenerate drbd/.drbd_git_revision
make[1]: Entering directory `/tmp/pkg/drbd-9.0.23-1/drbd'
Calling toplevel makefile of kernel source tree, which I believe is in
KDIR=/lib/modules/3.10.0-1127.10.1.el7.x86_64/build
make -C /lib/modules/3.10.0-1127.10.1.el7.x86_64/build M=/tmp/pkg/drbd-9.0.23-1/drbd modules
COMPAT alloc_workqueue_takes_fmt
COMPAT before_4_13_kernel_read
COMPAT blkdev_issue_zeroout_discard
COMPAT drbd_release_returns_void
COMPAT genl_policy_in_ops
COMPAT have_WB_congested_enum
COMPAT have_SHASH_DESC_ON_STACK
COMPAT have_allow_kernel_signal
COMPAT have_atomic_dec_if_positive_linux
COMPAT have_atomic_in_flight
COMPAT have_bd_claim_by_disk
COMPAT have_bd_unlink_disk_holder
COMPAT have_bio_bi_bdev
COMPAT have_bio_bi_error
COMPAT have_bio_bi_opf
COMPAT have_bio_bi_status
COMPAT have_bio_clone_fast
COMPAT have_bio_flush
COMPAT have_bio_op_shift
COMPAT have_bio_rw
COMPAT have_bio_free
COMPAT have_bio_set_op_attrs
COMPAT have_bioset_create_front_pad
COMPAT have_bioset_init
COMPAT have_bioset_need_bvecs
COMPAT have_blk_check_plugged
COMPAT have_blk_qc_t_make_request
COMPAT have_blk_queue_flag_set
COMPAT have_blk_queue_make_request
COMPAT have_blk_queue_merge_bvec
COMPAT have_blk_queue_plugged
COMPAT have_blk_queue_split_q_bio
COMPAT have_blk_queue_write_cache
COMPAT have_blk_queue_split_q_bio_bioset
COMPAT have_blkdev_get_by_path
COMPAT have_d_inode
COMPAT have_file_inode
COMPAT have_generic_start_io_acct_q_rw_sect_part
COMPAT have_generic_start_io_acct_rw_sect_part
COMPAT have_genl_family_parallel_ops
COMPAT have_ib_cq_init_attr
COMPAT have_ib_get_dma_mr
COMPAT have_idr_alloc
COMPAT have_idr_is_empty
COMPAT have_inode_lock
COMPAT have_ktime_to_timespec64
COMPAT have_kvfree
COMPAT have_max_send_recv_sge
COMPAT have_netlink_cb_portid
COMPAT have_nla_nest_start_noflag
COMPAT have_nla_parse_deprecated
COMPAT have_nla_put_64bit
COMPAT have_pointer_backing_dev_info
COMPAT have_part_stat_h
COMPAT have_prandom_u32
COMPAT have_proc_create_single
COMPAT have_ratelimit_state_init
COMPAT have_rb_augment_functions
COMPAT have_refcount_inc
COMPAT have_req_hardbarrier
COMPAT have_req_noidle
COMPAT have_req_nounmap
COMPAT have_req_op_write
COMPAT have_req_op_write_same
COMPAT have_req_op_write_zeroes
COMPAT have_req_prio
COMPAT have_req_write
COMPAT have_req_write_same
COMPAT have_shash_desc_zero
COMPAT have_security_netlink_recv
COMPAT have_signed_nla_put
COMPAT have_simple_positive
COMPAT have_struct_bvec_iter
COMPAT have_struct_kernel_param_ops
COMPAT have_struct_size
COMPAT have_time64_to_tm
COMPAT have_timer_setup
COMPAT have_void_make_request
COMPAT hlist_for_each_entry_has_three_parameters
COMPAT ib_alloc_pd_has_2_params
COMPAT ib_device_has_ops
COMPAT ib_post_send_const_params
COMPAT ib_query_device_has_3_params
COMPAT kmap_atomic_page_only
COMPAT need_make_request_recursion
COMPAT queue_limits_has_discard_zeroes_data
COMPAT rdma_create_id_has_net_ns
COMPAT sock_create_kern_has_five_parameters
COMPAT sock_ops_returns_addr_len
CHK /tmp/pkg/drbd-9.0.23-1/drbd/compat.3.10.0-1127.10.1.el7.x86_64.h
UPD /tmp/pkg/drbd-9.0.23-1/drbd/compat.3.10.0-1127.10.1.el7.x86_64.h
CHK /tmp/pkg/drbd-9.0.23-1/drbd/compat.h
UPD /tmp/pkg/drbd-9.0.23-1/drbd/compat.h
make[4]: `drbd-kernel-compat/cocci_cache/f40245fde8bb98acdde06d9e38d717b0/compat.patch' is up to date.
PATCH
patching file ./drbd_int.h
patching file drbd_receiver.c
patching file drbd_main.c
patching file drbd_nla.c
patching file drbd_nl.c
patching file drbd_bitmap.c
patching file drbd_transport_tcp.c
patching file drbd_actlog.c
patching file kref_debug.c
patching file drbd_req.c
patching file drbd_sender.c
patching file drbd_debugfs.c
patching file drbd-headers/linux/genl_magic_func.h
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_dax_pmem.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_debugfs.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_bitmap.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_proc.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_sender.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_receiver.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_req.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_actlog.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/lru_cache.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_main.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_strings.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_nl.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_interval.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_state.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd-kernel-compat/drbd_wrappers.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_nla.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_transport.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/kref_debug.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_kref_debug.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_transport_tcp.o
GEN /tmp/pkg/drbd-9.0.23-1/drbd/drbd_buildtag.c
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_buildtag.o
LD [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd.o
Building modules, stage 2.
MODPOST 2 modules
CC /tmp/pkg/drbd-9.0.23-1/drbd/drbd.mod.o
CC /tmp/pkg/drbd-9.0.23-1/drbd/drbd_transport_tcp.mod.o
LD [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd.ko
LD [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_transport_tcp.ko
mv .drbd_kernelrelease.new .drbd_kernelrelease
Memorizing module configuration ... done.
make[1]: Leaving directory `/tmp/pkg/drbd-9.0.23-1/drbd'
Module build was successful.
=======================================================================
With DRBD module version 8.4.5, we split out the management tools
into their own repository at https://github.com/LINBIT/drbd-utils
(tarball at http://links.linbit.com/drbd-download)
That started out as "drbd-utils version 8.9.0",
has a different release cycle,
and provides compatible drbdadm, drbdsetup and drbdmeta tools
for DRBD module versions 8.3, 8.4 and 9.
Again: to manage DRBD 9 kernel modules and above,
you want drbd-utils >= 9.3 from above url.
=======================================================================
DRBD version loaded:
version: 9.0.23-1 (api:2/proto:86-116)
GIT-hash: d16bfab7a4033024fed2d99d3b179aa6bb6eb300 build by @5c911893f9f7, 2020-07-10 10:32:52
Transports (api:16): tcp (9.0.23-1)
I've tried recreating the issue, but on my setup its working as expected:
Need a git checkout to regenerate drbd/.drbd_git_revision
make[1]: Entering directory `/tmp/pkg/drbd-9.0.23-1/drbd'
Calling toplevel makefile of kernel source tree, which I believe is in
KDIR=/lib/modules/3.10.0-1127.10.1.el7.x86_64/build
make -C /lib/modules/3.10.0-1127.10.1.el7.x86_64/build M=/tmp/pkg/drbd-9.0.23-1/drbd modules
COMPAT before_4_13_kernel_read
COMPAT alloc_workqueue_takes_fmt
COMPAT blkdev_issue_zeroout_discard
COMPAT drbd_release_returns_void
COMPAT have_SHASH_DESC_ON_STACK
COMPAT genl_policy_in_ops
COMPAT have_allow_kernel_signal
COMPAT have_WB_congested_enum
COMPAT have_atomic_dec_if_positive_linux
COMPAT have_bd_unlink_disk_holder
COMPAT have_bd_claim_by_disk
COMPAT have_atomic_in_flight
COMPAT have_bio_bi_opf
COMPAT have_bio_bi_error
COMPAT have_bio_bi_bdev
COMPAT have_bio_bi_status
COMPAT have_bio_clone_fast
COMPAT have_bio_flush
COMPAT have_bio_free
COMPAT have_bio_op_shift
COMPAT have_bio_rw
COMPAT have_bio_set_op_attrs
COMPAT have_bioset_create_front_pad
COMPAT have_bioset_init
COMPAT have_bioset_need_bvecs
COMPAT have_blk_check_plugged
COMPAT have_blk_qc_t_make_request
COMPAT have_blk_queue_flag_set
COMPAT have_blk_queue_make_request
COMPAT have_blk_queue_merge_bvec
COMPAT have_blk_queue_plugged
COMPAT have_blk_queue_split_q_bio
COMPAT have_blk_queue_split_q_bio_bioset
COMPAT have_blk_queue_write_cache
COMPAT have_blkdev_get_by_path
COMPAT have_d_inode
COMPAT have_file_inode
COMPAT have_generic_start_io_acct_rw_sect_part
COMPAT have_genl_family_parallel_ops
COMPAT have_generic_start_io_acct_q_rw_sect_part
COMPAT have_ib_cq_init_attr
COMPAT have_ib_get_dma_mr
COMPAT have_idr_alloc
COMPAT have_idr_is_empty
COMPAT have_inode_lock
COMPAT have_kvfree
COMPAT have_max_send_recv_sge
COMPAT have_netlink_cb_portid
COMPAT have_nla_nest_start_noflag
COMPAT have_nla_parse_deprecated
COMPAT have_nla_put_64bit
COMPAT have_pointer_backing_dev_info
COMPAT have_ktime_to_timespec64
COMPAT have_part_stat_h
COMPAT have_prandom_u32
COMPAT have_proc_create_single
COMPAT have_ratelimit_state_init
COMPAT have_rb_augment_functions
COMPAT have_refcount_inc
COMPAT have_req_noidle
COMPAT have_req_nounmap
COMPAT have_req_hardbarrier
COMPAT have_req_op_write
COMPAT have_req_op_write_same
COMPAT have_req_op_write_zeroes
COMPAT have_req_prio
COMPAT have_req_write
COMPAT have_req_write_same
COMPAT have_security_netlink_recv
COMPAT have_shash_desc_zero
COMPAT have_simple_positive
COMPAT have_struct_bvec_iter
COMPAT have_signed_nla_put
COMPAT have_struct_kernel_param_ops
COMPAT have_struct_size
COMPAT have_time64_to_tm
COMPAT have_timer_setup
COMPAT have_void_make_request
COMPAT hlist_for_each_entry_has_three_parameters
COMPAT ib_alloc_pd_has_2_params
COMPAT ib_device_has_ops
COMPAT ib_post_send_const_params
COMPAT ib_query_device_has_3_params
COMPAT kmap_atomic_page_only
COMPAT need_make_request_recursion
COMPAT queue_limits_has_discard_zeroes_data
COMPAT rdma_create_id_has_net_ns
COMPAT sock_create_kern_has_five_parameters
COMPAT sock_ops_returns_addr_len
CHK /tmp/pkg/drbd-9.0.23-1/drbd/compat.3.10.0-1127.10.1.el7.x86_64.h
UPD /tmp/pkg/drbd-9.0.23-1/drbd/compat.3.10.0-1127.10.1.el7.x86_64.h
CHK /tmp/pkg/drbd-9.0.23-1/drbd/compat.h
UPD /tmp/pkg/drbd-9.0.23-1/drbd/compat.h
make[4]: `drbd-kernel-compat/cocci_cache/f40245fde8bb98acdde06d9e38d717b0/compat.patch' is up to date.
PATCH
patching file ./drbd_int.h
patching file drbd_receiver.c
patching file drbd_main.c
patching file drbd_nla.c
patching file drbd_nl.c
patching file drbd_bitmap.c
patching file drbd_transport_tcp.c
patching file drbd_actlog.c
patching file kref_debug.c
patching file drbd_req.c
patching file drbd_sender.c
patching file drbd_debugfs.c
patching file drbd-headers/linux/genl_magic_func.h
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_dax_pmem.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_debugfs.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_bitmap.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_proc.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_sender.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_receiver.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_req.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_actlog.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/lru_cache.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_main.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_strings.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_nl.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_interval.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_state.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd-kernel-compat/drbd_wrappers.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_nla.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_transport.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/kref_debug.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_kref_debug.o
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_transport_tcp.o
GEN /tmp/pkg/drbd-9.0.23-1/drbd/drbd_buildtag.c
CC [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_buildtag.o
LD [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd.o
Building modules, stage 2.
MODPOST 2 modules
CC /tmp/pkg/drbd-9.0.23-1/drbd/drbd.mod.o
CC /tmp/pkg/drbd-9.0.23-1/drbd/drbd_transport_tcp.mod.o
LD [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd_transport_tcp.ko
LD [M] /tmp/pkg/drbd-9.0.23-1/drbd/drbd.ko
mv .drbd_kernelrelease.new .drbd_kernelrelease
Memorizing module configuration ... done.
make[1]: Leaving directory `/tmp/pkg/drbd-9.0.23-1/drbd'
Module build was successful.
=======================================================================
With DRBD module version 8.4.5, we split out the management tools
into their own repository at https://github.com/LINBIT/drbd-utils
(tarball at http://links.linbit.com/drbd-download)
That started out as "drbd-utils version 8.9.0",
has a different release cycle,
and provides compatible drbdadm, drbdsetup and drbdmeta tools
for DRBD module versions 8.3, 8.4 and 9.
Again: to manage DRBD 9 kernel modules and above,
you want drbd-utils >= 9.3 from above url.
=======================================================================
DRBD version loaded:
version: 9.0.23-1 (api:2/proto:86-116)
GIT-hash: d16bfab7a4033024fed2d99d3b179aa6bb6eb300 build by @centos-7-k8s-101.test, 2020-07-10 12:39:33
Transports (api:16): tcp (9.0.23-1)
Is there anything you can think of that might be "different" on your cluster? is it using a different container engine? Is using some extra long hostnames? :shrug:
We are using kubelet 1.18.0 and docker 19.3.8. Could you try those versions as well?
@WanzenBug would you like to consider using docker run docker technique for drbd-kernel-module-injector? It should avoid troubleshooting such an issue from now and then, and also allows switching between centos and ubuntu images automatically. I implemented here https://github.com/piraeusdatastore/piraeus/blob/master/dockerfiles/piraeus-init/bin/init-node.sh So far, it works fine each time.
# kubectl get pod piraeus-op-ns-node-zx2fz -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu, memory request for container
linstor-satellite; cpu, memory limit for container linstor-satellite; cpu, memory
request for init container drbd-kernel-module-injector; cpu, memory limit for
init container drbd-kernel-module-injector'
kubernetes.io/psp: dce-psp-allow-all
creationTimestamp: "2020-07-13T08:42:56Z"
generateName: piraeus-op-ns-node-
labels:
app: piraeus-op-ns
controller-revision-hash: 587747f9d9
pod-template-generation: "1"
role: piraeus-node
name: piraeus-op-ns-node-zx2fz
namespace: default
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: DaemonSet
name: piraeus-op-ns-node
uid: 4dc05c60-6886-4227-909f-3516b9fc51fc
resourceVersion: "217873"
selfLink: /api/v1/namespaces/default/pods/piraeus-op-ns-node-zx2fz
uid: 4a10037a-18eb-46db-b132-faa4dd623366
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchFields:
- key: metadata.name
operator: In
values:
- k8s-worker-1
containers:
- args:
- startSatellite
image: daocloud.io/piraeus/piraeus-server:v1.7.1
imagePullPolicy: IfNotPresent
name: linstor-satellite
ports:
- containerPort: 3366
hostPort: 3366
protocol: TCP
readinessProbe:
failureThreshold: 10
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 3366
timeoutSeconds: 5
resources:
limits:
cpu: 128m
memory: "268435456"
requests:
cpu: 64m
memory: "268435456"
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/linstor
name: linstor-conf
- mountPath: /dev/
name: device-dir
- mountPath: /sys/
name: sys-dir
- mountPath: /lib/modules/
mountPropagation: Bidirectional
name: modules-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-v6mlw
readOnly: true
dnsPolicy: ClusterFirstWithHostNet
enableServiceLinks: true
hostNetwork: true
imagePullSecrets:
- name: drbdiocred
initContainers:
- env:
- name: LB_HOW
value: compile
image: daocloud.io/piraeus/drbd9-centos7:v9.0.24
imagePullPolicy: IfNotPresent
name: drbd-kernel-module-injector
resources:
limits:
cpu: 128m
memory: "268435456"
requests:
cpu: 64m
memory: "268435456"
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/src
name: src-dir
readOnly: true
- mountPath: /lib/modules/
name: modules-dir
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-v6mlw
readOnly: true
nodeName: k8s-worker-1
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/disk-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/pid-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/unschedulable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/network-unavailable
operator: Exists
volumes:
- configMap:
defaultMode: 420
name: piraeus-op-ns-config
name: linstor-conf
- hostPath:
path: /dev/
type: ""
name: device-dir
- hostPath:
path: /sys/
type: Directory
name: sys-dir
- hostPath:
path: /lib/modules/
type: DirectoryOrCreate
name: modules-dir
- hostPath:
path: /usr/src
type: Directory
name: src-dir
- name: default-token-v6mlw
secret:
defaultMode: 420
secretName: default-token-v6mlw
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2020-07-13T08:42:56Z"
message: 'containers with incomplete status: [drbd-kernel-module-injector]'
reason: ContainersNotInitialized
status: "False"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2020-07-13T08:42:56Z"
message: 'containers with unready status: [linstor-satellite]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2020-07-13T08:42:56Z"
message: 'containers with unready status: [linstor-satellite]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2020-07-13T08:42:56Z"
status: "True"
type: PodScheduled
containerStatuses:
- image: daocloud.io/piraeus/piraeus-server:v1.7.1
imageID: ""
lastState: {}
name: linstor-satellite
ready: false
restartCount: 0
started: false
state:
waiting:
reason: PodInitializing
hostIP: 192.168.176.191
initContainerStatuses:
- containerID: docker://0036729dd7407258ecccf07d398425e29a39c9bcec54771aeb28b2e8fe11d5e1
image: daocloud.io/piraeus/drbd9-centos7:v9.0.24
imageID: docker-pullable://daocloud.io/piraeus/drbd9-centos7@sha256:d9acaa6f3db4ac619b38c1f8975d991f974b15cd9cf3ad676ab0dde823b62a51
lastState: {}
name: drbd-kernel-module-injector
ready: false
restartCount: 0
state:
running:
startedAt: "2020-07-13T08:42:58Z"
phase: Pending
podIP: 192.168.176.191
podIPs:
- ip: 192.168.176.191
qosClass: Burstable
startTime: "2020-07-13T08:42:56Z"
docker inspect
[
{
"Id": "0036729dd7407258ecccf07d398425e29a39c9bcec54771aeb28b2e8fe11d5e1",
"Created": "2020-07-13T08:42:57.541909695Z",
"Path": "/bin/sh",
"Args": [
"-c",
"/entry.sh"
],
"State": {
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 50399,
"ExitCode": 0,
"Error": "",
"StartedAt": "2020-07-13T08:42:58.203137184Z",
"FinishedAt": "0001-01-01T00:00:00Z"
},
"Image": "sha256:3b4d3a6d45e2ddcaf723e9b9367bf8051da993b49dd2e34f2c07394662e07640",
"ResolvConfPath": "/var/lib/containers/docker/containers/0b180bf3bf7f877d3b0795dbeeeb52c9b2e85a37b915cf00c9dc8263ef6961bb/resolv.conf",
"HostnamePath": "/var/lib/containers/docker/containers/0b180bf3bf7f877d3b0795dbeeeb52c9b2e85a37b915cf00c9dc8263ef6961bb/hostname",
"HostsPath": "/var/lib/kubelet/pods/4a10037a-18eb-46db-b132-faa4dd623366/etc-hosts",
"LogPath": "/var/lib/containers/docker/containers/0036729dd7407258ecccf07d398425e29a39c9bcec54771aeb28b2e8fe11d5e1/0036729dd7407258ecccf07d398425e29a39c9bcec54771aeb28b2e8fe11d5e1-json.log",
"Name": "/k8s_drbd-kernel-module-injector_piraeus-op-ns-node-zx2fz_default_4a10037a-18eb-46db-b132-faa4dd623366_0",
"RestartCount": 0,
"Driver": "overlay2",
"Platform": "linux",
"MountLabel": "",
"ProcessLabel": "",
"AppArmorProfile": "",
"ExecIDs": null,
"HostConfig": {
"Binds": [
"/usr/src:/usr/src:ro",
"/lib/modules/:/lib/modules/",
"/var/lib/kubelet/pods/4a10037a-18eb-46db-b132-faa4dd623366/volumes/kubernetes.io~secret/default-token-v6mlw:/var/run/secrets/kubernetes.io/serviceaccount:ro",
"/var/lib/kubelet/pods/4a10037a-18eb-46db-b132-faa4dd623366/etc-hosts:/etc/hosts",
"/var/lib/kubelet/pods/4a10037a-18eb-46db-b132-faa4dd623366/containers/drbd-kernel-module-injector/e47c5956:/dev/termination-log"
],
"ContainerIDFile": "",
"LogConfig": {
"Type": "json-file",
"Config": {
"max-file": "3",
"max-size": "100m"
}
},
"NetworkMode": "container:0b180bf3bf7f877d3b0795dbeeeb52c9b2e85a37b915cf00c9dc8263ef6961bb",
"PortBindings": null,
"RestartPolicy": {
"Name": "no",
"MaximumRetryCount": 0
},
"AutoRemove": false,
"VolumeDriver": "",
"VolumesFrom": null,
"CapAdd": null,
"CapDrop": null,
"Capabilities": null,
"Dns": null,
"DnsOptions": null,
"DnsSearch": null,
"ExtraHosts": null,
"GroupAdd": null,
"IpcMode": "container:0b180bf3bf7f877d3b0795dbeeeb52c9b2e85a37b915cf00c9dc8263ef6961bb",
"Cgroup": "",
"Links": null,
"OomScoreAdj": 993,
"PidMode": "",
"Privileged": true,
"PublishAllPorts": false,
"ReadonlyRootfs": false,
"SecurityOpt": [
"seccomp=unconfined",
"label=disable"
],
"UTSMode": "host",
"UsernsMode": "",
"ShmSize": 67108864,
"Runtime": "runc",
"ConsoleSize": [
0,
0
],
"Isolation": "",
"CpuShares": 65,
"Memory": 268435456,
"NanoCpus": 0,
"CgroupParent": "/kubepods/burstable/pod4a10037a-18eb-46db-b132-faa4dd623366",
"BlkioWeight": 0,
"BlkioWeightDevice": null,
"BlkioDeviceReadBps": null,
"BlkioDeviceWriteBps": null,
"BlkioDeviceReadIOps": null,
"BlkioDeviceWriteIOps": null,
"CpuPeriod": 100000,
"CpuQuota": 12800,
"CpuRealtimePeriod": 0,
"CpuRealtimeRuntime": 0,
"CpusetCpus": "",
"CpusetMems": "",
"Devices": [],
"DeviceCgroupRules": null,
"DeviceRequests": null,
"KernelMemory": 0,
"KernelMemoryTCP": 0,
"MemoryReservation": 0,
"MemorySwap": 268435456,
"MemorySwappiness": null,
"OomKillDisable": false,
"PidsLimit": null,
"Ulimits": null,
"CpuCount": 0,
"CpuPercent": 0,
"IOMaximumIOps": 0,
"IOMaximumBandwidth": 0,
"MaskedPaths": null,
"ReadonlyPaths": null
},
"GraphDriver": {
"Data": {
"LowerDir": "/var/lib/containers/docker/overlay2/808ed7827f797163bff5959275e31980b4e07afb543921668eb278c6b5234a4e-init/diff:/var/lib/containers/docker/overlay2/02f5fd59d05e68bae1b48b8b623eed451b9d4811c1ebf60c59762c721ca13077/diff:/var/lib/containers/docker/overlay2/fd846522960f1a88c94679609cc61b366fa60ceb1c2fb5d9cddc2b8772a82b06/diff:/var/lib/containers/docker/overlay2/8c3df6299d323e7e348eec6b65c0a21e18ee037da17f85aa080736991a742ad5/diff:/var/lib/containers/docker/overlay2/fee7e0ac3c2464d388eb37afa40130eea8e30a3e62d1181dc9d41bddad30e21b/diff:/var/lib/containers/docker/overlay2/52a3c3d5d1ecc989173677912012e6f850ae8520f55cd3c9008833c69cff7c7a/diff",
"MergedDir": "/var/lib/containers/docker/overlay2/808ed7827f797163bff5959275e31980b4e07afb543921668eb278c6b5234a4e/merged",
"UpperDir": "/var/lib/containers/docker/overlay2/808ed7827f797163bff5959275e31980b4e07afb543921668eb278c6b5234a4e/diff",
"WorkDir": "/var/lib/containers/docker/overlay2/808ed7827f797163bff5959275e31980b4e07afb543921668eb278c6b5234a4e/work"
},
"Name": "overlay2"
},
"Mounts": [
{
"Type": "bind",
"Source": "/var/lib/kubelet/pods/4a10037a-18eb-46db-b132-faa4dd623366/volumes/kubernetes.io~secret/default-token-v6mlw",
"Destination": "/var/run/secrets/kubernetes.io/serviceaccount",
"Mode": "ro",
"RW": false,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/var/lib/kubelet/pods/4a10037a-18eb-46db-b132-faa4dd623366/etc-hosts",
"Destination": "/etc/hosts",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/var/lib/kubelet/pods/4a10037a-18eb-46db-b132-faa4dd623366/containers/drbd-kernel-module-injector/e47c5956",
"Destination": "/dev/termination-log",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/usr/src",
"Destination": "/usr/src",
"Mode": "ro",
"RW": false,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/lib/modules",
"Destination": "/lib/modules",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
}
],
"Config": {
"Hostname": "k8s-worker-1",
"Domainname": "",
"User": "0",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"LB_HOW=compile",
"PIRAEUS_OPERATOR_METRICS_SERVICE_PORT=8383",
"PIRAEUS_OPERATOR_METRICS_PORT_8383_TCP=tcp://172.31.231.61:8383",
"PIRAEUS_OPERATOR_METRICS_PORT_8383_TCP_PROTO=tcp",
"PIRAEUS_OPERATOR_METRICS_PORT_8383_TCP_ADDR=172.31.231.61",
"PIRAEUS_OP_STORK_SERVICE_SERVICE_HOST=172.31.209.176",
"KUBERNETES_PORT_443_TCP_PORT=443",
"KUBERNETES_PORT_443_TCP_ADDR=172.31.0.1",
"KUBERNETES_PORT_443_TCP=tcp://172.31.0.1:443",
"PIRAEUS_OPERATOR_METRICS_PORT=tcp://172.31.231.61:8383",
"PIRAEUS_OPERATOR_METRICS_SERVICE_PORT_HTTP_METRICS=8383",
"PIRAEUS_OPERATOR_METRICS_SERVICE_PORT_CR_METRICS=8686",
"PIRAEUS_OP_STORK_SERVICE_PORT_8099_TCP_ADDR=172.31.209.176",
"PIRAEUS_OP_STORK_SERVICE_PORT_443_TCP=tcp://172.31.209.176:443",
"PIRAEUS_OP_STORK_SERVICE_PORT_443_TCP_PROTO=tcp",
"KUBERNETES_SERVICE_PORT_HTTPS=443",
"PIRAEUS_OPERATOR_METRICS_PORT_8686_TCP=tcp://172.31.231.61:8686",
"PIRAEUS_OPERATOR_METRICS_PORT_8686_TCP_ADDR=172.31.231.61",
"PIRAEUS_OP_STORK_SERVICE_SERVICE_PORT_EXTENDER=8099",
"PIRAEUS_OP_STORK_SERVICE_PORT_8099_TCP_PROTO=tcp",
"PIRAEUS_OP_STORK_SERVICE_PORT_443_TCP_ADDR=172.31.209.176",
"KUBERNETES_SERVICE_HOST=172.31.0.1",
"KUBERNETES_SERVICE_PORT=443",
"PIRAEUS_OPERATOR_METRICS_PORT_8686_TCP_PROTO=tcp",
"PIRAEUS_OP_STORK_SERVICE_SERVICE_PORT=8099",
"KUBERNETES_PORT=tcp://172.31.0.1:443",
"PIRAEUS_OPERATOR_METRICS_SERVICE_HOST=172.31.231.61",
"PIRAEUS_OP_STORK_SERVICE_PORT_8099_TCP=tcp://172.31.209.176:8099",
"PIRAEUS_OP_STORK_SERVICE_PORT_443_TCP_PORT=443",
"PIRAEUS_OP_STORK_SERVICE_SERVICE_PORT_WEBHOOK=443",
"PIRAEUS_OP_STORK_SERVICE_PORT=tcp://172.31.209.176:8099",
"PIRAEUS_OPERATOR_METRICS_PORT_8686_TCP_PORT=8686",
"PIRAEUS_OP_STORK_SERVICE_PORT_8099_TCP_PORT=8099",
"KUBERNETES_PORT_443_TCP_PROTO=tcp",
"PIRAEUS_OPERATOR_METRICS_PORT_8383_TCP_PORT=8383",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"DRBD_VERSION=9.0.24-1"
],
"Cmd": null,
"Healthcheck": {
"Test": [
"NONE"
]
},
"Image": "sha256:3b4d3a6d45e2ddcaf723e9b9367bf8051da993b49dd2e34f2c07394662e07640",
"Volumes": null,
"WorkingDir": "",
"Entrypoint": [
"/bin/sh",
"-c",
"/entry.sh"
],
"OnBuild": null,
"Labels": {
"annotation.io.kubernetes.container.hash": "7eeb964f",
"annotation.io.kubernetes.container.restartCount": "0",
"annotation.io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
"annotation.io.kubernetes.container.terminationMessagePolicy": "File",
"annotation.io.kubernetes.pod.terminationGracePeriod": "30",
"io.kubernetes.container.logpath": "/var/log/pods/default_piraeus-op-ns-node-zx2fz_4a10037a-18eb-46db-b132-faa4dd623366/drbd-kernel-module-injector/0.log",
"io.kubernetes.container.name": "drbd-kernel-module-injector",
"io.kubernetes.docker.type": "container",
"io.kubernetes.pod.name": "piraeus-op-ns-node-zx2fz",
"io.kubernetes.pod.namespace": "default",
"io.kubernetes.pod.uid": "4a10037a-18eb-46db-b132-faa4dd623366",
"io.kubernetes.sandbox.id": "0b180bf3bf7f877d3b0795dbeeeb52c9b2e85a37b915cf00c9dc8263ef6961bb",
"org.label-schema.build-date": "20200504",
"org.label-schema.license": "GPLv2",
"org.label-schema.name": "CentOS Base Image",
"org.label-schema.schema-version": "1.0",
"org.label-schema.vendor": "CentOS",
"org.opencontainers.image.created": "2020-05-04 00:00:00+01:00",
"org.opencontainers.image.licenses": "GPL-2.0-only",
"org.opencontainers.image.title": "CentOS Base Image",
"org.opencontainers.image.vendor": "CentOS"
}
},
"NetworkSettings": {
"Bridge": "",
"SandboxID": "",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Ports": {},
"SandboxKey": "",
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"EndpointID": "",
"Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAddress": "",
"IPPrefixLen": 0,
"IPv6Gateway": "",
"MacAddress": "",
"Networks": {}
}
}
]
@alexzhc is that a pod where the error happened? its using a newer image version than the original (v9.0.24
vs v9.0.23-1
)
I tried re-creating the issue without much luck. I have not found a combination of versions and settings that leads to the injector crashing. It may be related to this issue: https://github.com/kubernetes/kubernetes/issues/84539
@alexzhc if you have some time and can reliably reproduce the error, could you try setting enableServiceLinks: false
on the daemonset? It might fix the issue and we are not relying on service discovery via environment variable anyways
Sure, I'll try. The original error message is /bin/bash: /usr/bin/mkdir: Argument list too long; /bin/bash: /usr/bin/tr: Argument list too long
. How to enable bash -x
in the build script, so that we can see the actually mkdir xxx
arguments?
This is a very strange error indeed.
How to enable bash -x in the build script, so that we can see the actually mkdir xxx arguments?
Since the error occurs in the module build itself and not in any wrapper script, we need to switch on the kernel's verbose build option. In /entry.sh
in the container, there is this make call. Change that from make -j
to make -j V=1
and you should see the commands that are being executed. This output would be very interesting.
You will know when this is working because you will not find lines like COMPAT have_blk_queue_split_q_bio_bioset
in the log, but rather long commands like:
mkdir -p /drbd/drbd/.compat_test.4.15.18/ ; var=`echo COMPAT_have_blk_queue_split_q_bio_bioset | tr -- -a-z _A-Z | tr -dc A-Z0-9_` ; if gcc -Wp,-MD,/drbd/drbd/.compat_test.4.15.18/.have_blk_queue_split_q_bio_bioset.result.d -nostdinc -isystem /usr/lib/gcc/x86_64-linux-gnu/7/include -I/drbd/drbd -I/drbd/drbd/drbd-headers -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -Iubuntu/include -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -fno-PIE -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -DCONFIG_X86_X32_ABI -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_AVX512=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mindirect-branch=thunk-extern -mindirect-branch-register -fno-jump-tables -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation -Wno-format-overflow -Wno-int-in-bool-context -O2 --param=allow-store-data-races=0 -DCC_HAVE_ASM_GOTO -Wframe-larger-than=1024 -fstack-protector-strong -Wno-unused-but-set-variable -Wno-unused-const-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-var-tracking-assignments -pg -mfentry -DCC_USING_FENTRY -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fno-stack-check -fconserve-stack -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init -I/drbd/drbd -I/drbd/drbd/drbd-kernel-compat -DIDR_GET_NEXT_EXPORTED -DBLKDEV_ISSUE_ZEROOUT_EXPORTED -DCONFIG_KREF_DEBUG -DMODULE -DKBUILD_BASENAME='"have_blk_queue_split_q_bio_bioset"' -Werror-implicit-function-declaration -c -o /drbd/drbd/.compat_test.4.15.18/.have_blk_queue_split_q_bio_bioset.o /drbd/drbd/drbd-kernel-compat/tests/have_blk_queue_split_q_bio_bioset.c > /drbd/drbd/.compat_test.4.15.18/have_blk_queue_split_q_bio_bioset.stdout 2> /drbd/drbd/.compat_test.4.15.18/have_blk_queue_split_q_bio_bioset.stderr -D"KBUILD_MODNAME=\"compat_dummy\"" ; then echo "#define $var" ; else echo "/* #undef $var */" ; fi > /drbd/drbd/.compat_test.4.15.18/have_blk_queue_split_q_bio_bioset.result
I tried enableServiceLinks: false
. The problem persists.
Here is the verbose output of the compilation error: injector.error.verbose.txt
I just find that it actually works on kube 1.15.3, but fails on kube 1.18.0. @WanzenBug could you try kube 1.18.0?
I managed to make some progress: I can recreate the bug using:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:58:59Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:50:46Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
$ uname -a
Linux centos-7-k8s-200.test 3.10.0-1127.13.1.el7.x86_64 #1 SMP Tue Jun 23 15:46:38 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
COMPAT have_bio_start_io_acct
COMPAT have_bioset_create_front_pad
make[3]: execvp: /bin/bash: Argument list too long
make[3]: *** [/tmp/pkg/drbd-9.0.24-1/drbd/.compat_test.3.10.0-1127.13.1.el7.x86_64/have_bioset_init.result] Error 127
make[3]: *** Waiting for unfinished jobs....
/bin/bash: line 0: echo: write error: Cannot allocate memory
make[3]: *** [/tmp/pkg/drbd-9.0.24-1/drbd/.compat_test.3.10.0-1127.13.1.el7.x86_64/have_bio_free.result] Error 1
make[2]: *** [_module_/tmp/pkg/drbd-9.0.24-1/drbd] Error 2
make[1]: *** [kbuild] Error 2
make[1]: Leaving directory `/tmp/pkg/drbd-9.0.24-1/drbd'
make: *** [module] Error 2
The problem goes away when using make V=1
instead of make -j V=1
.
I don't know how I never noticed this, but:
initContainers:
- env:
- name: LB_HOW
value: compile
image: daocloud.io/piraeus/drbd9-centos7:v9.0.24
imagePullPolicy: IfNotPresent
name: drbd-kernel-module-injector
resources:
limits:
cpu: 128m
memory: "268435456"
requests:
cpu: 64m
memory: "268435456"
the memory limits are too low. Using make -j
we start multiple cc
instances at the same time, they can easily allocate more than 256MB. The error message Argument list too long
is probably a side-effect of not being able to allocate the environment or argument memory region.
I tried a limit of 1GiB, and it seemed to fix the problem. Looking at the code, it seems I missed setting the resources for the init container in #57.
@rck I think make -j
is a dangerous default. We shouldn't let make use unlimited parallelism.
quite frankly I don't know why I gave it just a -j
. Probably I thought it is the responsibility of the user, and "good luck". But I'm easily convinced that it is not a good default.
What about a default of make
, and a LB_MAKEOPTS
?
LB_MAKEOPTS
sounds like a clean solution, so I am in favour :+1:
ACK. will prepare such a thing
@WanzenBug good catch. I did not notice my k8s-based platform automatically sets a resource LimitRange
on each namespace created, including default. Pods with no resource specified will use the default setting in the LimitRange
. I guess this is another case in which the chart should allow customizing pod resources including initContainers.
kubectl get limitrange dce-default-limit-range -o yaml
apiVersion: v1
kind: LimitRange
metadata:
creationTimestamp: "2020-06-12T03:55:10Z"
name: dce-default-limit-range
namespace: default
resourceVersion: "1644"
selfLink: /api/v1/namespaces/default/limitranges/dce-default-limit-range
uid: e8bf74f6-0dac-4baf-8a4b-36adc41e24ac
spec:
limits:
- default:
cpu: 128m
memory: "268435456"
defaultRequest:
cpu: 64m
memory: "268435456"
maxLimitRequestRatio:
cpu: "4"
memory: "1"
type: Container
After deleting LimitRange1
, now drbd-kernel-module-injector
works. @WanzenBug please also make initContainer resource tunable in value.yaml.
setting make flags via LB_MAKEOPTS
is now public. As the Dockerfiles
fetch this file, feel free to rebuild the current containers if necessary.
https://github.com/LINBIT/drbd/blob/drbd-9.0/docker/entry.sh#L125
Compile fails when using drbd9-centos7:v9.0.23 on CentOS 7.8