kubeovn / kube-ovn

A Bridge between SDN and Cloud Native (Project under CNCF)
https://kubeovn.github.io/docs/stable/en/
Apache License 2.0
1.95k stars 442 forks source link

linux-htb QoS priority values only range from 0 to 7 #1685

Closed Cynerva closed 2 years ago

Cynerva commented 2 years ago

Expected Behavior

I'm following the linux-htb documentation for kube-ovn here. I've created a pod annotated with ovn.kubernetes.io/priority: "50":

kind: Pod
apiVersion: v1
metadata:
  name: perf
  annotations:
    ovn.kubernetes.io/priority: "50"
spec:
  containers:
  - name: nginx
    image: kubeovn/perf

I can see the priority="50" value in vsctl:

$ kubectl ko vsctl juju-0b8517-2 list qos
_uuid               : 1c8fabd5-6c42-42d9-a520-f737c7448853
external_ids        : {iface-id=perf.default, pod="default/perf"}
other_config        : {}
queues              : {0=ff2bafd6-b440-49df-9f8e-681bf0d9065e}
type                : linux-htb

_uuid               : 2865300b-8c67-4d62-9f78-50d96502f2b7
external_ids        : {}
other_config        : {}
queues              : {}
type                : linux-noop

$ kubectl ko vsctl juju-0b8517-2 list queue
_uuid               : ff2bafd6-b440-49df-9f8e-681bf0d9065e
dscp                : []
external_ids        : {iface-id=perf.default, pod="default/perf"}
other_config        : {priority="50"}

But I want to go deeper and see how HTB is configured. I should be able to use the tc command to see that priority=50 value somewhere in HTB qdisc configuration, right?

Actual Behavior

If I ssh to the node that the pod landed on, I can see the pod's interface:

$ ip link | grep htb
19: 1ec6889fb027_h@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc htb master ovs-system state UP mode DEFAULT group default qlen 1000

But if I look at the HTB classes on that interface:

$ tc class show dev 1ec6889fb027_h
class htb 1:1 parent 1:fffe prio 7 rate 11200bit ceil 10Gbit burst 1463b cburst 1250b
class htb 1:fffe root rate 10Gbit ceil 10Gbit burst 1250b cburst 1250b

The prio value is set to 7, not 50. Why is that?

If I change the pod's priority annotation to 3, then the priority annotation and the class prio values match:

$ kubectl patch po perf -p '{"metadata": {"annotations": {"ovn.kubernetes.io/priority": "3"}}}'
pod/perf patched

$ tc class show dev 1ec6889fb027_h
class htb 1:1 parent 1:fffe prio 3 rate 11200bit ceil 10Gbit burst 1463b cburst 1250b
class htb 1:fffe root rate 10Gbit ceil 10Gbit burst 1250b cburst 1250b

But any time I specify a priority value above 7, the prio value on the class does not go higher. For example, here I try setting it to 8:

$ kubectl patch po perf -p '{"metadata": {"annotations": {"ovn.kubernetes.io/priority": "8"}}}'
pod/perf patched

$ tc class show dev 1ec6889fb027_h
class htb 1:1 parent 1:fffe prio 7 rate 11200bit ceil 10Gbit burst 1463b cburst 1250b
class htb 1:fffe root rate 10Gbit ceil 10Gbit burst 1250b cburst 1250b

Steps to Reproduce the Problem

  1. Create a pod with ovn.kubernetes.io/priority: "50" annotation
  2. ssh to the node that the pod was scheduled to, use ip link to find the pod's network interface
  3. run tc class show dev <link> to see that the htb prio value doesn't match

Additional Info

Cynerva commented 2 years ago

This limitation seems to exist in Linux. I see the same behavior if I try to create an HTB class with tc directly. Prio values never go above 7:

$ tc class add dev 1ec6889fb027_h parent 1:fffe classid 1:2 htb rate 1Kbit prio 8
$ tc class show dev 1ec6889fb027_h classid 1:2
class htb 1:2 parent 1:fffe prio 7 rate 1Kbit ceil 1Kbit burst 1600b cburst 1600b

and I think this is where the number of priorities is defined in Linux source: https://github.com/torvalds/linux/blob/72a8e05d4f66b5af7854df4490e3135168694b6b/include/uapi/linux/pkt_sched.h#L405

So I don't really expect values above 7 to work, but rather, I think the kube-ovn documentation and defaults need to be updated to keep HTB priority values in the range 0 to 7

hongzhen-ma commented 2 years ago

@Cynerva Thanks for the report. We will modify our document and releated code later.