openebs / mayastor

Dynamically provision Stateful Persistent Replicated Cluster-wide Fabric Volumes & Filesystems for Kubernetes that is provisioned from an optimized NVME SPDK backend data storage stack.
Apache License 2.0
742 stars 106 forks source link

Mayastor doesn't work on LXC #1541

Open joselbr2099 opened 11 months ago

joselbr2099 commented 11 months ago

I'm trying to install mayastor on lxc containers, but. I have a lot of errors like this:

[node@master ~]$ kubectl logs mayastor-io-engine-67rsq -n mayastor
Defaulted container "io-engine" out of: io-engine, agent-core-grpc-probe (init), etcd-probe (init), initialize-pool (init)
[2023-11-09T04:50:01.219264229+00:00  INFO io_engine:io-engine.rs:179] Engine responsible for managing I/Os version 1.0.0, revision b0734db654d8 (v2.0.0)
[2023-11-09T04:50:01.219350100+00:00  INFO io_engine:io-engine.rs:158] free_pages 2MB: 1024 nr_pages 2MB: 1024
[2023-11-09T04:50:01.219355039+00:00  INFO io_engine:io-engine.rs:159] free_pages 1GB: 0 nr_pages 1GB: 0
[2023-11-09T04:50:01.219424639+00:00  INFO io_engine:io-engine.rs:211] kernel io_uring support: yes
[2023-11-09T04:50:01.219434277+00:00  INFO io_engine:io-engine.rs:215] kernel nvme initiator multipath support: yes
[2023-11-09T04:50:01.219463633+00:00  INFO io_engine::core::env:env.rs:791] loading mayastor config YAML file /var/local/io-engine/config.yaml
[2023-11-09T04:50:01.219473762+00:00  INFO io_engine::subsys::config:mod.rs:168] Config file /var/local/io-engine/config.yaml is empty, reverting to default config
[2023-11-09T04:50:01.219479462+00:00  INFO io_engine::subsys::config::opts:opts.rs:151] Overriding NVMF_TCP_MAX_QUEUE_DEPTH value to '32'
[2023-11-09T04:50:01.219484031+00:00  INFO io_engine::subsys::config::opts:opts.rs:151] Overriding NVME_QPAIR_CONNECT_ASYNC value to 'true'
[2023-11-09T04:50:01.219489100+00:00  INFO io_engine::subsys::config:mod.rs:216] Applying Mayastor configuration settings
TELEMETRY: No legacy callbacks, legacy socket not created
[2023-11-09T04:50:01.338061791+00:00  INFO io_engine::core::mempool:mempool.rs:50] Memory pool 'bdev_io_ctx' with 65535 elements (24 bytes size) successfully created
[2023-11-09T04:50:01.340062414+00:00  INFO io_engine::core::mempool:mempool.rs:50] Memory pool 'nvme_ctrl_io_ctx' with 65535 elements (72 bytes size) successfully created
[2023-11-09T04:50:01.340079736+00:00  INFO io_engine::core::env:env.rs:846] Total number of cores available: 2
[2023-11-09T04:50:01.348767002+00:00  INFO io_engine::core::reactor:reactor.rs:182] Scheduled SPDK thread 'init_thread' (0x5614b01db8d0) on core #1
[2023-11-09T04:50:01.348794433+00:00  INFO io_engine::core::reactor:reactor.rs:158] Init thread ID 1
[2023-11-09T04:50:01.348859014+00:00  INFO io_engine::core::reactor:reactor.rs:301] Starting reactor polling loop core=2 tid=8
[2023-11-09T04:50:01.349879408+00:00  INFO io_engine::core::env:env.rs:875] All cores locked and loaded!
[2023-11-09T04:50:01.441057353+00:00  INFO io_engine::bdev::nexus::nexus_module:nexus_module.rs:36] Initializing Nexus CAS Module
[2023-11-09T04:50:01.441319084+00:00  INFO io_engine::core::reactor:reactor.rs:182] Scheduled SPDK thread 'mayastor_nvmf_tcp_pg_core_1' (0x5614b01e1d00) on core #1
[2023-11-09T04:50:01.441542042+00:00  INFO io_engine::core::reactor:reactor.rs:182] Scheduled SPDK thread 'mayastor_nvmf_tcp_pg_core_2' (0x5614b01e20a0) on core #2
[2023-11-09T04:50:01.488127314+00:00  INFO io_engine::core::env:env.rs:749] Using 'MY_POD_IP' environment variable for IP address for NVMF target network interface
[2023-11-09T04:50:01.488217994+00:00  INFO io_engine::subsys::nvmf::target:target.rs:263] nvmf target listening on 10.164.151.84:(4421,8420)
[2023-11-09T04:50:01.488245856+00:00  INFO io_engine::subsys::nvmf::subsystem:subsystem.rs:542] Subsystem start in progress... self=NvmfSubsystem { id: 0, subtype: "Discovery", subnqn: "nqn.2014-08.org.nvmexpress.discovery", sn: "00000000000000000000", mn: "Mayastor NVMe controller", allow_any_host: 1, ana_reporting: 0, listeners: Some([Transport ID { trtype: 3, trstring: "TCP", traddr: "10.164.151.84", trsvcid: "8420" }]) }
[2023-11-09T04:50:01.488271695+00:00  INFO io_engine::subsys::nvmf::subsystem:subsystem.rs:593] Subsystem start completed: Ok self=NvmfSubsystem { id: 0, subtype: "Discovery", subnqn: "nqn.2014-08.org.nvmexpress.discovery", sn: "00000000000000000000", mn: "Mayastor NVMe controller", allow_any_host: 1, ana_reporting: 0, listeners: Some([Transport ID { trtype: 3, trstring: "TCP", traddr: "10.164.151.84", trsvcid: "8420" }]) }
[2023-11-09T04:50:01.488280582+00:00  INFO io_engine::subsys::nvmf::target:target.rs:359] nvmf target accepting new connections and is ready to roll..💃
[2023-11-09T04:50:01.550043302+00:00 ERROR mayastor::spdk:iscsi_subsystem.c:106] create PDU data out pool failed   
[2023-11-09T04:50:01.550070072+00:00 ERROR mayastor::spdk:iscsi_subsystem.c:1130] initialize_all_pools() failed   
[2023-11-09T04:50:01.550073949+00:00 ERROR mayastor::spdk:iscsi_subsystem.c:1159] iscsi_parse_globals() failed   
[2023-11-09T04:50:01.550078227+00:00 ERROR mayastor::spdk:subsystem.c:169] Init subsystem iscsi failed   
thread 'main' panicked at 'assertion failed: receiver.await.unwrap()', io-engine/src/core/env.rs:893:13
stack backtrace:
   0: rust_begin_unwind
             at ./rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panicking.rs:517:5
   1: core::panicking::panic_fmt
             at ./rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/core/src/panicking.rs:101:14
   2: core::panicking::panic
             at ./rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/core/src/panicking.rs:50:5
   3: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
   4: <async_task::runnable::spawn_local::Checked<F> as core::future::future::Future>::poll
   5: async_task::raw::RawTask<F,T,S>::run
   6: io_engine::core::reactor::Reactor::poll_once
   7: io_engine::core::reactor::Reactor::block_on
   8: io_engine::core::env::MayastorEnvironment::init
   9: io_engine::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

all problems are in mayastor io engine:

[node@master ~]$ kubectl get po -o wide -n mayastor
NAME                                          READY   STATUS             RESTARTS         AGE     IP               NODE     NOMINATED NODE   READINESS GATES
mayastor-csi-node-2wlwv                       2/2     Running            2 (58m ago)      3h27m   10.164.151.21    master   <none>           <none>
mayastor-csi-node-whfqj                       2/2     Running            2 (58m ago)      3h27m   10.164.151.84    node5    <none>           <none>
mayastor-csi-node-jl52h                       2/2     Running            2 (58m ago)      3h27m   10.164.151.4     node6    <none>           <none>
mayastor-csi-node-lc8bc                       2/2     Running            2 (58m ago)      3h27m   10.164.151.108   node2    <none>           <none>
mayastor-csi-node-v6r46                       2/2     Running            2 (58m ago)      3h27m   10.164.151.174   node3    <none>           <none>
mayastor-csi-node-svcx4                       2/2     Running            2 (58m ago)      3h27m   10.164.151.9     node4    <none>           <none>
etcd-fbwmh625km                               1/1     Running            1 (58m ago)      3h26m   10.1.104.10      node2    <none>           <none>
etcd-sq96ckp4wh                               1/1     Running            1 (58m ago)      3h14m   10.1.219.79      master   <none>           <none>
mayastor-agent-core-666897f597-x67r9          1/1     Running            3 (58m ago)      3h27m   10.1.139.11      node6    <none>           <none>
etcd-9qvb8qxwh8                               1/1     Running            1 (58m ago)      3h27m   10.1.135.11      node3    <none>           <none>
mayastor-api-rest-5664ffdc75-k677g            1/1     Running            1 (58m ago)      3h27m   10.1.3.75        node4    <none>           <none>
mayastor-operator-diskpool-5d9fb986f8-8bhrw   1/1     Running            1 (58m ago)      3h27m   10.1.3.77        node4    <none>           <none>
mayastor-csi-controller-8dfcfb656-zt7gc       3/3     Running            3 (58m ago)      3h27m   10.164.151.84    node5    <none>           <none>
mayastor-io-engine-khqj8                      0/1     CrashLoopBackOff   38 (4m35s ago)   3h27m   10.164.151.174   node3    <none>           <none>
mayastor-io-engine-kg6vg                      0/1     CrashLoopBackOff   39 (4m19s ago)   3h27m   10.164.151.4     node6    <none>           <none>
mayastor-io-engine-67rsq                      0/1     CrashLoopBackOff   38 (4m16s ago)   3h27m   10.164.151.84    node5    <none>           <none>
mayastor-io-engine-r5fdn                      0/1     CrashLoopBackOff   38 (4m12s ago)   3h27m   10.164.151.9     node4    <none>           <none>
mayastor-io-engine-995xl                      0/1     CrashLoopBackOff   38 (4m10s ago)   3h27m   10.164.151.108   node2    <none>           <none>
mayastor-io-engine-xxc67                      0/1     CrashLoopBackOff   38 (4m6s ago)    3h27m   10.164.151.21    master   <none>           <none>
etcd-operator-mayastor-8574f998bc-4qchw       0/1     CrashLoopBackOff   35 (73s ago)     3h27m   10.1.33.141      node5    <none>           <none>

and etcd operator

[node@master ~]$ kubectl logs etcd-operator-mayastor-8574f998bc-4qchw
Error from server (NotFound): pods "etcd-operator-mayastor-8574f998bc-4qchw" not found
[node@master ~]$ kubectl logs etcd-operator-mayastor-8574f998bc-4qchw -n mayastor
time="2023-11-09T05:00:22Z" level=info msg="etcd-operator Version: 0.10.0+git"
time="2023-11-09T05:00:22Z" level=info msg="Git SHA: 29fb1ab"
time="2023-11-09T05:00:22Z" level=info msg="Go Version: go1.17.10"
time="2023-11-09T05:00:22Z" level=info msg="Go OS/Arch: linux/amd64"
I1109 05:00:22.221031       1 leaderelection.go:242] attempting to acquire leader lease  mayastor/etcd-operator...
I1109 05:00:22.238394       1 leaderelection.go:252] successfully acquired lease mayastor/etcd-operator
time="2023-11-09T05:00:22Z" level=info msg="Event(v1.ObjectReference{Kind:\"Endpoints\", Namespace:\"mayastor\", Name:\"etcd-operator\", UID:\"3ab809a4-b09c-4b98-ba71-d6ce0656baee\", APIVersion:\"v1\", ResourceVersion:\"80033\", FieldPath:\"\"}): type: 'Normal' reason: 'LeaderElection' etcd-operator-mayastor-8574f998bc-4qchw became leader"
time="2023-11-09T05:00:22Z" level=info msg="start running..." cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:00:30Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:00:30Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:00:38Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:00:38Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:00:46Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:00:46Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:00:54Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:00:54Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:01:02Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:01:02Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:01:10Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:01:10Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:01:18Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:01:18Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:01:26Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:01:26Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:01:34Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:01:34Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:01:42Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:01:42Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:01:50Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:01:50Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:01:58Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:01:58Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:02:06Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:02:06Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:02:14Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:02:14Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:02:22Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:02:22Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:02:30Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:02:30Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:02:38Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:02:38Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:02:46Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:02:46Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:02:54Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:02:54Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:03:02Z" level=info msg="Start reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
time="2023-11-09T05:03:02Z" level=info msg="Finish reconciling" cluster-name=etcd cluster-namespace=mayastor pkg=cluster
I1109 05:03:14.165592       1 leaderelection.go:288] failed to renew lease mayastor/etcd-operator: failed to tryAcquireOrRenew context deadline exceeded
time="2023-11-09T05:03:14Z" level=fatal msg="leader election lost"

My environment:

I got the same error on Rocky linuix and ubuntu server microk8s version: MicroK8s v1.27.7 revision 6103

Sorry this is not a documentation issue

tiagolobocastro commented 11 months ago

@joselbr2099 could you please try v2.4.0 ?

joselbr2099 commented 11 months ago

in v2.4.0 same error in io-engine pods

[node@master ~]$ kubectl get pods -n mayastor 
NAME                                            READY   STATUS             RESTARTS        AGE
mayastor-io-engine-rh5kg                        0/2     Init:0/2           0               18m
mayastor-agent-ha-node-692k6                    0/1     Init:0/1           0               18m
mayastor-csi-controller-5d9dd847f8-cmjbv        0/5     Init:0/1           0               18m
mayastor-agent-ha-node-wzpr7                    0/1     Init:0/1           0               18m
mayastor-io-engine-stxwp                        0/2     Init:0/2           0               18m
mayastor-agent-ha-node-fhr28                    0/1     Init:0/1           0               18m
mayastor-operator-diskpool-57cbdc854c-gpzcn     0/1     Init:0/2           0               18m
mayastor-io-engine-qxmxk                        0/2     Init:0/2           0               18m
mayastor-agent-ha-node-h7z55                    0/1     Init:0/1           0               18m
mayastor-io-engine-bsnsv                        0/2     Init:0/2           0               18m
mayastor-io-engine-2f9sd                        0/2     Init:0/2           0               18m
mayastor-agent-ha-node-p6hnm                    0/1     Init:0/1           0               18m
mayastor-api-rest-646c479b4b-w8ghv              0/1     Init:0/2           0               18m
mayastor-agent-ha-node-7j84z                    0/1     Init:0/1           0               18m
mayastor-io-engine-6crw4                        0/2     Init:0/2           0               18m
mayastor-obs-callhome-59b44bbff6-gh8sb          2/2     Running            0               18m
mayastor-csi-node-9rnwz                         2/2     Running            0               18m
mayastor-csi-node-sp7t7                         2/2     Running            0               18m
mayastor-csi-node-g9xns                         2/2     Running            0               18m
mayastor-csi-node-4stn6                         2/2     Running            0               18m
mayastor-csi-node-ktn4z                         2/2     Running            0               18m
mayastor-csi-node-nskgr                         2/2     Running            0               18m
mayastor-etcd-0                                 0/1     Running            2 (2m44s ago)   18m
mayastor-etcd-2                                 0/1     Running            2 (2m41s ago)   18m
mayastor-etcd-1                                 0/1     Running            2 (2m32s ago)   18m
mayastor-localpv-provisioner-85c8774849-q79qt   0/1     CrashLoopBackOff   6 (97s ago)     18m
mayastor-loki-0                                 1/1     Running            4 (2m18s ago)   18m
mayastor-agent-core-85499cf6db-fp947            0/2     CrashLoopBackOff   11 (76s ago)    18m
mayastor-promtail-htxdt                         1/1     Running            0               18m
mayastor-nats-0                                 2/3     Running            0               18m
mayastor-nats-1                                 2/3     Running            0               18m
mayastor-nats-2                                 3/3     Running            2 (7m4s ago)    18m
mayastor-promtail-xblgc                         1/1     Running            0               18m
mayastor-promtail-hnw6s                         0/1     Running            0               18m
mayastor-promtail-j2j4f                         0/1     Running            0               18m
mayastor-promtail-xw7wt                         0/1     Running            0               18m
mayastor-promtail-6qm2x                         0/1     Running            0               18m

this is my micr0k8s profile

config:
  boot.autostart: 'true'
  linux.kernel_modules: >-
    ip_vs,ip_vs_rr,ip_vs_wrr,ip_vs_sh,ip_tables,ip6_tables,netlink_diag,nf_nat,overlay,br_netfilter
  raw.lxc: |
    lxc.apparmor.profile=unconfined
    lxc.mount.auto=proc:rw sys:rw cgroup:rw
    lxc.cgroup.devices.allow=a
    lxc.cap.drop=
  security.nesting: 'true'
  security.privileged: 'true'
description: ''
devices:
  aadisable:
    path: /sys/module/nf_conntrack/parameters/hashsize
    source: /sys/module/nf_conntrack/parameters/hashsize
    type: disk
  aadisable2:
    path: /dev/kmsg
    source: /dev/kmsg
    type: unix-char
  aadisable3:
    path: /sys/fs/bpf
    source: /sys/fs/bpf
    type: disk
  aadisable4:
    path: /proc/sys/net/netfilter/nf_conntrack_max
    source: /proc/sys/net/netfilter/nf_conntrack_max
    type: disk
name: microk8s

and my huge pages profile

config:
  limits.hugepages.2MB: 1GB
  raw.lxc: >
    lxc.mount.entry = hugetlbfs dev/hugepages hugetlbfs rw,relatime,create=dir 0
    0
  security.privileged: 'true'
  security.syscalls.intercept.mount: 'true'
  security.syscalls.intercept.mount.allowed: hugetlbfs
description: ''
devices: {}
name: hugepages

my HugePages in each node

[node@master ~]$ grep HugePages /proc/meminfo
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:    346112 kB
HugePages_Total:    1024
HugePages_Free:     1024
HugePages_Rsvd:        0
HugePages_Surp:        0

I think I need some config about iscsi, please help me

tiagolobocastro commented 11 months ago

We used to have terraform deploy Mayastor on lxd. Haven't used it for a long time so might not work.. anyway maybe some of the config there might help you - https://github.com/openebs/mayastor-control-plane/blob/develop/terraform/cluster/mod/lxd/main.tf

tiagolobocastro commented 11 months ago

So, lxd can work: https://github.com/openebs/mayastor-control-plane/pull/691 But it won't be great because io-engine tries to bind to specific cpu's, we probably need to come with a more flexible way of specifying which cores to bind to.

windowsrefund commented 2 months ago

I was able to overcome the cpu affinity issue with the io-engine pods by ensuring each of my containers did not include cores: in their config. Here's the evidence that all containers can use all cores on my system which is running on Proxmox:

# pct cpusets
-------------------------------------------
100:  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
103:  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
104:  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
105:  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
-------------------------------------------

and as we can see, all io-engine pods are running

> kubectl get pods -l app=io-engine
NAME                      READY   STATUS    RESTARTS      AGE
openebs-io-engine-ns8lm   1/1     Running   2 (12m ago)   103m
openebs-io-engine-pgvch   1/1     Running   2 (12m ago)   103m
openebs-io-engine-tv5v8   1/1     Running   2 (12m ago)   103m

However, It does appear to me that these pods are using 2 specific cores (of 16) when I look at things via htop. Is there something hard-coded here that trumps what is set via a Helm deployment?

USER-SUPPLIED VALUES:
engines:
  local:
    lvm:
      enabled: false
    replicated:
      mayastor:
        enabled: true
    zfs:
      enabled: false
mayastor:
  base:
    metrics:
      enabled: false
  io_engine:
    resources:
      limits:
        cpu: "4"
        hugepages-2Mi: 2Gi
        memory: 1Gi
      requests:
        cpu: "4"
        hugepages-2Mi: 2Gi
        memory: 1Gi
tiagolobocastro commented 2 months ago

By default I think they use 1 and 2. Which in this case is bad because now the io-engine's are sharing cores this way. I think a simply fix, is to use cores from allowed list, rather than entire list. IMHO would be simple and probably good enough until we have ability to specify unique configuration per node in a nicer way.

windowsrefund commented 2 months ago

Yep. That's exactly what I see on my end... cores 1 and 2. Thanks for the confirmation. I'll continue to monitor this issue and will help to test any improvements should they be made available.

tiagolobocastro commented 2 months ago

On a running io-engine could you please run grep "Cpus_allowed" /proc/self/status ? And also cat /sys/fs/cgroup/cpuset.cpus.effective

windowsrefund commented 2 months ago

On a running io-engine could you please run grep "Cpus_allowed" /proc/self/status ? And also cat /sys/fs/cgroup/cpuset.cpus.effective

Are you unable to recreate this?

tiagolobocastro commented 2 months ago

On a running io-engine could you please run grep "Cpus_allowed" /proc/self/status ? And also cat /sys/fs/cgroup/cpuset.cpus.effective

Are you unable to recreate this?

I not using LXC atm, so if you have it running already it would be quicker :)

windowsrefund commented 2 months ago

Excuse me for asking but how are you going to contribute to this issue in any way if you're not able to reproduce? That just doesn't make any sense.

windowsrefund commented 2 months ago

@tiagolobocastro Still wondering how in the world you expect to do anything productive regarding this issue if you're unable and/or unwilling to reproduce the problem. Maybe use your words instead of emojis to explain?

avishnu commented 2 months ago

@windowsrefund Since you have an available setup with lxc, It will help if you can execute the commands as suggested here

windowsrefund commented 2 months ago

This is so incredibly stupid. I'm supposed to sit here and spoon feed you people with strings you should be able to fetch yourself simply by having the setup required to actually support this????? If you don't want to support LXC, just say so. Is this why this stupid bug has been open for nearly a year now? Because nobody is capable or willing to take 20m and setup a few lxc containers? Are you people kidding me?

How exactly would someone who doesn't have an LXC setup go about fixing the issue? I guess that person would just commit their "idea" and then expect ME to test it with absolutely no QA done on the development side? Are you serious?

This is a joke.

tiagolobocastro commented 2 months ago

You are being quite rude here, open source is about collaboration. I do hope you change your stance here.

Rest assured when a fix is submitted it will be tested and verified. Meanwhile collecting information and dumping it on a ticket is always helpful for whomever attempts to fix this. If you disagree that's fine, but please let's keep things civil.

Getting back to the issue at hand, my current train of thought is that we could have an operation mode where we specify only the number of cores, and then bind to the allowed cores by probing /proc/self/status. This does perhaps complicate things, eg: if we don't know which cores we using, how do we know which ones to isolate them from system threads? If I have some time next week, I'll have a play with LXC and update with more details.

windowsrefund commented 2 months ago

You are being quite rude here, open source is about collaboration. I do hope you change your stance here.

Wrong. You're being lazy and showing complete incompetence. You're just upset and embarrassed that I called you out on it. By the way, "open source" as you (mistakenly) refer to it, has NOTHING to do with "collaboration". If you knew anything, you'd know what the value of "Free Software" is in terms of liberty and freedoms 0-3. It's those freedoms that allow for the collaboration you think you're preaching about. By the way, if this is the way you "collaborate"; demanding people spoon feed you with information you should be capable of getting yourself, and then throwing an emoji hissy fit when you don't get it......... yea, that's not my definition.

Rest assured when a fix is submitted it will be tested and verified. Meanwhile collecting information and dumping it on a ticket is always helpful for whomever attempts to fix this. If you disagree that's fine, but please let's keep things civil.

You still didn't answer the question I asked twice. How do you expect to "work" on an issue you clearly have no intention of QAing? That's just lazy and silly. If you can't spin up a LXC environment in order to make use of the (now year old) information we have already supplied, maybe go do something else? I'm not here to submit simple text strings just because you're unwilling to recreate the well-established known problem space needed to test and QA a potential solution.

It's just laziness pure and simple.

Getting back to the issue at hand, my current train of thought is that we could have an operation mode where we specify only the number of cores, and then bind to the allowed cores by probing /proc/self/status. This does perhaps complicate things, eg: if we don't know which cores we using, how do we know which ones to isolate them from system threads? If I have some time next week, I'll have a play with LXC and update with more details.

Right, that's what you should have done before commenting on this thread.

tiagolobocastro commented 1 month ago

I resurfaced my lxd configuration and was able to confirm my initial assessment.

When configuring an lxd instance without limiting the cpu cores, then the processes running on the instance seem to get shared access to any of the cores:

Cpus_allowed:   0000ffff
Cpus_allowed_list:  0-15

If we use limits then it works as expected, example:

Cpus_allowed:   00001a40
Cpus_allowed_list:  6,9,11-12

We could perhaps auto-detect which cores are allowed and bind to those but this still has the disadvantage that other processes may be sharing the cpu with the io-engine. Next step I'll check if I can still set affinity to cpus outside of this allowed list if they're isolated in the host system.

Also to consider cpu manager static policy. This would allow us to ensure that the requested cpus are only used for our io-engine pod from the k8s standpoint. Though this would not prevent system threads from using them.

EDIT: Also seems there's no way of setting cpu_exclusive for the lxd containers, which means not only system threads but also other cpusets may run on those cpus...