kmesh-net / kmesh

High Performance ServiceMesh Data Plane Based on Programmable Kernel
https://kmesh.net
Apache License 2.0
445 stars 63 forks source link

Memory scaling issue with large number of services #945

Open tmodak27 opened 1 week ago

tmodak27 commented 1 week ago

Motivation: Our business case needs memory to scale as number of services increases, especially in scenarios where the scale is very high.

1. What we did

Environment Details:

We started scaling up in batches of 500 services using the below yaml file and command.


- scaling up command

$ for i in $(seq 1 500); do sed "s/foo-service/foo-service-0-$(date +%s-%N)/g" svc.yaml | kubectl apply -f -; done



After every 500 services, we measured the memory consumption using [inspektor gadget](https://www.inspektor-gadget.io/). The command used to measure the memory is `kubectl gadget top ebpf --sort comm`

**2. What we observed**: 

The total memory usage of the kmesh bpf map remained constant, even though the number of entries in the bpf map increased (table below).

<img width="500" alt="image" src="https://github.com/user-attachments/assets/ee7fade5-93b5-4657-8370-5ee408b70c9f">

The detailed table that has the memory consumption is attached below. Please refer to the column MAPMEMORY in the table.
[memory_hce.txt](https://github.com/user-attachments/files/17345717/memory_hce.txt)

**3. Why we think this is a problem**

Our business case requires memory usage to scale as we deploy more services, instead of remaining fixed. 
hzxuzhonghu commented 6 days ago

IN order to reduce the memory cost, we need to tune the scale up/in param, cc @nlgwcy

nlgwcy commented 6 days ago

IN order to reduce the memory cost, we need to tune the scale up/in param, cc @nlgwcy

ok, maybe the step of scaleup/scalein is too big. I will optimize it later.

lec-bit commented 4 days ago

I tried to use Inspektor-Gadget, but it couldn't account for the memory of the inner_map, so there was no memory change.

K8S.NODE                PROGID     TYPE         NAME         PID          COMM              RUNTIME RUNCOU…   MAPMEMORY MAPCOUNT  
ambient-worker          409        CGroupSockA… cgroup_conn… 1061919      kmesh-daemon      8.356µs 4            308KiB 6         
ambient-worker          406        SockOps      sockops_prog 1061919      kmesh-daemon      6.804µs 36          8.18MiB 8         
ambient-control-plane   408        SockOps      sockops_prog 1061921      kmesh-daemon      3.115µs 27          8.18MiB 8         
ambient-control-plane   410        CGroupSockA… cgroup_conn… 1061921      kmesh-daemon      2.035µs 3            308KiB 6         
ambient-worker          395        RawTracepoi… connect_ret  1061919      kmesh-daemon      1.523µs 4                0B 0         
ambient-control-plane   396        RawTracepoi… connect_ret  1061921      kmesh-daemon         90ns 3                0B 0         
ambient-control-plane   399        SockOps      cluster_man… 1061921      kmesh-daemon           0s 0          133.1MiB 9         
ambient-control-plane   400        SockOps      filter_chai… 1061921      kmesh-daemon           0s 0          8.078MiB 7         
ambient-control-plane   403        SockOps      filter_mana… 1061921      kmesh-daemon           0s 0          8.074MiB 6         
ambient-control-plane   407        SockOps      route_confi… 1061921      kmesh-daemon           0s 0          11.95MiB 8         
ambient-control-plane   414        CGroupSockA… cluster_man… 1061921      kmesh-daemon           0s 0          133.1MiB 9         
ambient-control-plane   415        CGroupSockA… filter_chai… 1061921      kmesh-daemon           0s 0          8.078MiB 7         
ambient-control-plane   416        CGroupSockA… filter_mana… 1061921      kmesh-daemon           0s 0          8.074MiB 6         
ambient-worker          401        SockOps      cluster_man… 1061919      kmesh-daemon           0s 0          133.1MiB 9         
ambient-worker          402        SockOps      filter_chai… 1061919      kmesh-daemon           0s 0          8.078MiB 7         
ambient-worker          404        SockOps      filter_mana… 1061919      kmesh-daemon           0s 0          8.074MiB 6         
ambient-worker          405        SockOps      route_confi… 1061919      kmesh-daemon           0s 0          11.95MiB 8         
ambient-worker          411        CGroupSockA… cluster_man… 1061919      kmesh-daemon           0s 0          133.1MiB 9         
ambient-worker          412        CGroupSockA… filter_chai… 1061919      kmesh-daemon           0s 0          8.078MiB 7         
ambient-worker          413        CGroupSockA… filter_mana… 1061919      kmesh-daemon           0s 0          8.074MiB 6   
lec-bit commented 4 days ago

I used the bpftool command to view bpf_map and statistics, and found that there is memory change. I created 10000 services, and during the process, kmesh scale up

sudo bpftool map -j | jq ' group_by(.name) | map({name: .[0].name, total_bytes_memlock: map(.bytes_memlock | tonumber) | add, maps: length}) | sort_by(.total_bytes_memlock)'

before:

[root@localhost kmesh]# sudo bpftool map -j | jq ' group_by(.name) | map({name: .[0].name, total_bytes_memlock: map(.bytes_memlock | tonumber) | add, maps: length}) | sort_by(.total_bytes_memlock)'
[
  {
    "name": "kmesh_events",
    "total_bytes_memlock": 0,
    "maps": 2
  },
  {
    "name": "map_of_sock_sto",
    "total_bytes_memlock": 0,
    "maps": 2
  },
  {
    "name": "bpf_log_level",
    "total_bytes_memlock": 8192,
    "maps": 2
  },
  {
    "name": "ig_fa_pick_ctx",
    "total_bytes_memlock": 8192,
    "maps": 2
  },
  {
    "name": "inner_map",
    "total_bytes_memlock": 8192,
    "maps": 2
  },
  {
    "name": "kmesh_version",
    "total_bytes_memlock": 8192,
    "maps": 2
  },
  {
    "name": "tmp_buf",
    "total_bytes_memlock": 8192,
    "maps": 2
  },
  {
    "name": "tmp_log_buf",
    "total_bytes_memlock": 8192,
    "maps": 2
  },
  {
    "name": ".rodata",
    "total_bytes_memlock": 49152,
    "maps": 6
  },
  {
    "name": "kmesh_listener",
    "total_bytes_memlock": 212992,
    "maps": 2
  },
  {
    "name": "kmesh_tail_call",
    "total_bytes_memlock": 278528,
    "maps": 8
  },
  {
    "name": "kmesh_manage",
    "total_bytes_memlock": 393216,
    "maps": 2
  },
  {
    "name": "ig_fa_records",
    "total_bytes_memlock": 1974272,
    "maps": 2
  },
  {
    "name": "containers",
    "total_bytes_memlock": 2113536,
    "maps": 2
  },
  {
    "name": "exec_args",
    "total_bytes_memlock": 3940352,
    "maps": 2
  },
  {
    "name": "map_of_router_c",
    "total_bytes_memlock": 8126464,
    "maps": 2
  },
  {
    "name": "kmesh_cluster",
    "total_bytes_memlock": 8388608,
    "maps": 2
  },
  {
    "name": "outer_map",
    "total_bytes_memlock": 16777216,
    "maps": 2
  },
  {
    "name": "map_of_cluster_",
    "total_bytes_memlock": 253771776,
    "maps": 4
  },
  {
    "name": null,
    "total_bytes_memlock": 268419072,
    "maps": 65532
  }
]

after:

[root@localhost kmesh]# sudo bpftool map -j | jq ' group_by(.name) | map({name: .[0].name, total_bytes_memlock: map(.bytes_memlock | tonumber) | add, maps: length}) | sort_by(.total_bytes_memlock)'
[
  {
    "name": "kmesh_events",
    "total_bytes_memlock": 0,
    "maps": 2
  },
  {
    "name": "map_of_sock_sto",
    "total_bytes_memlock": 0,
    "maps": 2
  },
  {
    "name": "bpf_log_level",
    "total_bytes_memlock": 8192,
    "maps": 2
  },
  {
    "name": "ig_fa_pick_ctx",
    "total_bytes_memlock": 8192,
    "maps": 2
  },
  {
    "name": "inner_map",
    "total_bytes_memlock": 8192,
    "maps": 2
  },
  {
    "name": "kmesh_version",
    "total_bytes_memlock": 8192,
    "maps": 2
  },
  {
    "name": "tmp_buf",
    "total_bytes_memlock": 8192,
    "maps": 2
  },
  {
    "name": "tmp_log_buf",
    "total_bytes_memlock": 8192,
    "maps": 2
  },
  {
    "name": ".rodata",
    "total_bytes_memlock": 49152,
    "maps": 6
  },
  {
    "name": "kmesh_listener",
    "total_bytes_memlock": 212992,
    "maps": 2
  },
  {
    "name": "kmesh_tail_call",
    "total_bytes_memlock": 278528,
    "maps": 8
  },
  {
    "name": "kmesh_manage",
    "total_bytes_memlock": 393216,
    "maps": 2
  },
  {
    "name": "ig_fa_records",
    "total_bytes_memlock": 1974272,
    "maps": 2
  },
  {
    "name": "containers",
    "total_bytes_memlock": 2113536,
    "maps": 2
  },
  {
    "name": "exec_args",
    "total_bytes_memlock": 3940352,
    "maps": 2
  },
  {
    "name": "map_of_router_c",
    "total_bytes_memlock": 8126464,
    "maps": 2
  },
  {
    "name": "kmesh_cluster",
    "total_bytes_memlock": 8388608,
    "maps": 2
  },
  {
    "name": "outer_map",
    "total_bytes_memlock": 16777216,
    "maps": 2
  },
  {
    "name": "map_of_cluster_",
    "total_bytes_memlock": 253771776,
    "maps": 4
  },
  {
    "name": null,
    "total_bytes_memlock": 536846336,
    "maps": 131066
  }
]

Below are the startup logs for kmesh. I started 1000 services, deleted them, and then brought them up again.

[root@localhost kmesh]# kubectl logs -f -n kmesh-system kmesh-57rbp 
cp: cannot create regular file '/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64/kmesh.ko': Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.dep.17.329771.1728891685, 301, 644): Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.dep.bin.17.329771.1728891685, 301, 644): Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.alias.17.329771.1728891685, 301, 644): Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.alias.bin.17.329771.1728891685, 301, 644): Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.softdep.17.329771.1728891685, 301, 644): Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.symbols.17.329771.1728891685, 301, 644): Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.symbols.bin.17.329771.1728891685, 301, 644): Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.builtin.bin.17.329771.1728891685, 301, 644): Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.builtin.alias.bin.17.329771.1728891685, 301, 644): Read-only file system
depmod: ERROR: openat(/lib/modules/6.1.19-7.0.0.17.oe2303.x86_64, modules.devname.17.329771.1728891685, 301, 644): Read-only file system
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --bpf-fs-path=\"/sys/fs/bpf\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --cgroup2-path=\"/mnt/kmesh_cgroup2\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --cni-etc-path=\"/etc/cni/net.d\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --conflist-name=\"\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --enable-accesslog=\"false\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --enable-bpf-log=\"true\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --enable-bypass=\"false\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --enable-mda=\"false\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --enable-secret-manager=\"false\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --help=\"false\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --mode=\"ads\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="FLAG: --plugin-cni-chained=\"true\"" subsys=manager
time="2024-10-14T07:41:25Z" level=info msg="kmesh start with Normal" subsys=bpf
Remaining resources are insufficient(0/0), and capacity expansion is required.
collect_outter_map_scaleup_slots:32767-32768-32767
time="2024-10-14T07:41:34Z" level=info msg="bpf loader start successfully" subsys=manager
time="2024-10-14T07:41:34Z" level=info msg="start kmesh manage controller successfully" subsys=controller
time="2024-10-14T07:41:34Z" level=info msg="service node sidecar~10.244.1.4~kmesh-57rbp.kmesh-system~kmesh-system.svc.cluster.local connect to discovery address istiod.istio-system.svc:15012" subsys=controller/config
time="2024-10-14T07:41:34Z" level=info msg="controller start successfully" subsys=manager
time="2024-10-14T07:41:34Z" level=info msg="start write CNI config" subsys="cni installer"
time="2024-10-14T07:41:34Z" level=info msg="kmesh cni use chained\n" subsys="cni installer"
time="2024-10-14T07:41:35Z" level=info msg="Copied /usr/bin/kmesh-cni to /opt/cni/bin." subsys="cni installer"
time="2024-10-14T07:41:35Z" level=info msg="wrote kubeconfig file /etc/cni/net.d/kmesh-cni-kubeconfig" subsys="cni installer"
time="2024-10-14T07:41:35Z" level=info msg="cni config file: /etc/cni/net.d/10-kindnet.conflist" subsys="cni installer"
time="2024-10-14T07:41:35Z" level=info msg="start cni successfully" subsys=manager
time="2024-10-14T07:41:35Z" level=info msg="start watching file /var/run/secrets/kubernetes.io/serviceaccount/token" subsys="cni installer"
Remaining resources are insufficient(22956/32768), and capacity expansion is required.
collect_outter_map_scaleup_slots:65534-32768-65534
The remaining resources are sufficient(19659/65536) and scale-in is required.
collect_outter_map_scalein_slots:57343-8192-57343
The remaining resources are sufficient(17180/57344) and scale-in is required.
collect_outter_map_scalein_slots:49151-8192-49151
The remaining resources are sufficient(14737/49152) and scale-in is required.
collect_outter_map_scalein_slots:40959-8192-40959
The remaining resources are sufficient(12282/40960) and scale-in is required.
collect_outter_map_scalein_slots:24739-8192-24739
time="2024-10-14T08:13:33Z" level=info msg="grpc reconnect succeed" subsys=controller
time="2024-10-14T08:30:40Z" level=info msg="wrote kubeconfig file /etc/cni/net.d/kmesh-cni-kubeconfig" subsys="cni installer"
time="2024-10-14T08:43:42Z" level=info msg="grpc reconnect succeed" subsys=controller
time="2024-10-14T09:12:46Z" level=info msg="grpc reconnect succeed" subsys=controller
time="2024-10-14T09:19:31Z" level=info msg="wrote kubeconfig file /etc/cni/net.d/kmesh-cni-kubeconfig" subsys="cni installer"
time="2024-10-14T09:45:27Z" level=info msg="grpc reconnect succeed" subsys=controller
time="2024-10-14T10:08:24Z" level=info msg="wrote kubeconfig file /etc/cni/net.d/kmesh-cni-kubeconfig" subsys="cni installer"
time="2024-10-14T10:14:09Z" level=info msg="grpc reconnect succeed" subsys=controller
time="2024-10-14T10:42:11Z" level=info msg="grpc reconnect succeed" subsys=controller
time="2024-10-14T10:57:16Z" level=info msg="wrote kubeconfig file /etc/cni/net.d/kmesh-cni-kubeconfig" subsys="cni installer"
time="2024-10-14T11:10:35Z" level=info msg="grpc reconnect succeed" subsys=controller
Remaining resources are insufficient(22954/32768), and capacity expansion is required.
collect_outter_map_scaleup_slots:65533-32768-65533