kohler / click

The Click modular router: fast modular packet processing and analysis
Other
734 stars 324 forks source link

Assign element to cpu thread #467

Open p4pe opened 3 years ago

p4pe commented 3 years ago

Hello, I am newbie to Click, and i am using click in user-level. I have two questions. Is it possible to:

1) Assign an element in a configuration file to a cpu thread 2) And second is it possible to pin this thread to a core?

tbarbette commented 3 years ago

1) StaticThreadSched(elementname 1); to pin elementname to thread 1. With --dpdk it's the default. In standard userlevel you have the "-a" flag.

p4pe commented 3 years ago

So StaticThreadSched(FromDevice(eth1) 1)?

And when i run the click configuration file, should i run it with -a ?

tbarbette commented 3 years ago

Check https://github.com/kohler/click/wiki/Language for the language ;)

Well if you you use DPDK element (from the other discussion I guess you want), you need to launch with --dpdk.

For -a, it depends on what you want. Nowadays people advocate for run-to-completion, so you should use -a.

This will give you the click basics, such as naming too : https://github.com/tbarbette/fastclick/wiki/Tutorial

p4pe commented 3 years ago

Thank @tbarbette, i did not manage to integrate click with dpdk yet.. I m using just user-level click.

I have a scenario that i run a sink(FastUdpSource -> ToDevice) a vnf(FromDevice->Queue->ToDevice) and a sink(FromDevice->Counter->Discard).. each running on a separate node. And first I want to measure the throughput.. I change the RATE of the packets send per/second. And i count the rate that reach the "sink"

ahenning commented 3 years ago

George, I have a vague idea of the issue you are trying to solve with thread pinning and suspect the solution might not work as intended, but I think in theory what you are asking could be achieved with:

cilck -a -j3 sink-sender.click Schedule elements with: StaticThreadSched(elementname1 0, elementname2 0, ... 0);

click -a -j3 vnf-bridge.click Schedule elements with: StaticThreadSched(elementname1 1, elementname2 1, ... 1);

click -a -j3 sink-receiver.click Schedule elements with: StaticThreadSched(elementname1 2, elementname3 2, ... 2);

This assumes each node has access to 3 threads. It also assumes CPU thread 0 on node1 and CPU thread 0 on node 2 etc will be scheduled on the same host CPU thread, which may or may not be the case. Perhaps limiting each node to 1 thread and pinning each node to a specific core on the host would give you more control?

p4pe commented 3 years ago

Thank you @ahenning, to be more precise, i want to test 2 different scenarios.

1) The FromDevice element and the ToDevice element in the vnf-bridge are in the same core 2) In the different core.

So if i understood what you wrote i must write an click configuration file in like StaticThreadSched(FromDevice 1 , To Device 1)?? and how i link them with the queue element?

And after this i have to execute click -a -j3 vnf-bridge.click

Im sorry if im wrong, but im currently start working on click

tbarbette commented 3 years ago

@p4pe StaticThreadSched takes element names, not a declaration. Please take the time to read the links I gave you :) Looking at the examples in the "conf" folder will help too.

You can give a value to -a to pin at a certain offset. So you can simply use "-a 2 -j 1" and every element of that click will run on core 2. If you want to use multiple cores in a single instance, you can use "-a 2 -j 2" and use StaticThreadSched(elementnameA 0, elementnameB 1); to pin elementnameA and elementnameB to core 2 and 3.

p4pe commented 3 years ago

Ok now i think i got it.. Thank you @tbarbette..

I built click with ./configure --enable-userlevel --disable-linuxmodule --enable-user-multithread --enable-multithread but when i run click -j config

I had this warning warning: Click was built without multithread support, running single threaded

ahenning commented 3 years ago

For the second scenario the config would look something like:

FromDevice -> Queue -> unq1::Unqueue -> ToDevice; StaticThreadSched(unq1 1);

FromDevice should run on thread 0, and packets pushed to and processed by ToDevice should run on thread 1. If the configuration and elements are more complicated, Click has a home thread function that you could add to your elements to verify if needed.

Not all elements are thread safe, so one way would be to place the elements you want to run on specific core between two queues e.g.

FromDevice -> Queue -> unq0::Unqueue -> SlowPathElement -> Queue -> unq1::Unqueue -> ToDevice; StaticThreadSched(unq0 1, unq1 0)

This is assuming the whole config is more complicated and the idea is to only run the resource heavy elements on a dedicated core and the rest on thread 0. This info might not be relevant to your use case but I am just adding that here for posterity's sake.

Also, if the element timers need to run on say the same thread 1, then the actual element also needs to be scheduled via StaticThreadSched and not just the pull to push converter like Unqueue.

p4pe commented 3 years ago

I appreciate your help @ahenning.. Im playing with this now, but i realize that despite i ran ./configure --enable-userlevel --disable-linuxmodule --enable-user-multithread --enable-multithread

The click did not built with multithread enabled.. and i m trying to see why

tbarbette commented 3 years ago

Ok now i think i got it.. Thank you @tbarbette..

I built click with ./configure --enable-userlevel --disable-linuxmodule --enable-user-multithread --enable-multithread but when i run click -j config

I had this warning warning: Click was built without multithread support, running single threaded

Did you make clean then make again? Weird.

p4pe commented 3 years ago

I did a new installation in new machine, and now it is ok

p4pe commented 3 years ago

Hello @tbarbette I m trying to pin elements to threads but i took this Error : router configuration specified twice

My click configuration file is:
FromDevice(enp4s0f1) -> Queue -> ToDevice(enp4s0f0); StaticThreadSched(FromDevice 1, ToDevice 0);

And I m running click with: click -a 2 -j 2 forwarder.click

gkatsikas commented 3 years ago

Hi,

First, you need to create an instance of From/ToDevice as follows:

in:: FromDevice(enp4s0f1); out :: ToDevice(enp4s0f0);

Then describe your pipeline:

in -> Queue -> out;

Finally, pin each instance to the correct thread:

StaticThreadSched(in 1, out 0);

The way you did it, Click creates a different instance for every From/ToDevice call. This is why the output states that router configuration is specified twice.

p4pe commented 3 years ago

Thank you @gkatsikas, but I have the same issue with this click configuration file:

in::FromDevice(enp4s0f1); out::ToDevice(enp4s0f0);

in -> Queue -> out;

StaticThreadSched(in 1, out 0);

IoakeimFotoglou commented 3 years ago

I'm following the issue because I also have to study a very similar case. My setup is as follows: I have three different nodes in the order below, each running a different click configuration Node1 (source) --> Node2 --> Node3 (sink)

Node1: Source.click

FastUDPSource(800000, 10000000, 60, 3c:fd:fe:04:64:42, 192.168.6.2, 1234

14:18:77:26:68:15, 192.168.6.5,? 1234) -> ToDevice(eth1);

Node2: in::FromDevice(eth1); out::ToDevice(eth2); in -> Queue -> out; StaticThreadSched(in 1, out 0);

Node3: FromDevice(eth1) -> c:: Counter - > Discard DriverManager(wait 45s, save c.rate -, stop)

In node2 we are running click with the command click -j 2 forwarder.click and the approximate rate at node3 is around 360619,21

Whereas running it with click -a -j 2 the rate is 467960,45

What is the actual difference between these two commands? I mean how does triggering the affinity switch work and why does it change the rate?

Also trying to use both the affinity and thread switches like this: click - a 2 -j 2 forwarder.click Returns the same error, "router configuration specified twice" whereas if the affinity switch is given but left empty, i.e click -a -j 2 forwarder.click It runs normally

How would I go about pinning the threads in two cores a) of a cpu in one socket b) different sockets

Kindly thank you @tbarbette @ahenning and @gkatsikas for your input

tbarbette commented 3 years ago

I'd advise to keep "-a" empty, and play with the affinity inside Click. If you want to offset by two, just pin elements to thread 2 and 3. For the performance : without -a you leave the OS switching threads around and it's actually not very good at that.

WIth the forwarder using two cores, I'd expect the sink or source to become the bottleneck. But I'd advise using DPDK as soon as performance matters.

tbarbette commented 3 years ago

Random advices:

p4pe commented 3 years ago

@IoakeimFotoglou we have the same issues i see. @tbarbette I will go too with your advices.

Every time I try to play with the affinity inside the Click I "took" this warning forwarder.click:6: While configuring ‘StaticThreadSched@4 :: StaticThreadSched’: warning: thread preference 2 out of range

I'm using the configuration that @gkatsikas proposed.

Thank you in advance

p4pe commented 3 years ago

With configuration **in::FromDevice(enp6s0f1); out::ToDevice(enp6s0f0);

in -> Queue -> out;

StaticThreadSched(in 0, out 1);**

and click -a -j 2 forwarder.click is working fine

With configuration

**in::FromDevice(enp6s0f1); out::ToDevice(enp6s0f0);

in -> Queue -> out;

StaticThreadSched(in 2, out 3);**

and click -a j 2 forwarder.click

I have the issue i mentioned above.

gkatsikas commented 3 years ago

Well, in the second configuration you explicitly ask for threads 2 and 3 in StaticThreadSched, but you call Click with only 2 cores (i.e., j 2 --> which implies that threads 0 and 1 will be allocated). If you bump j to 4 instead of 2 it should work.

p4pe commented 3 years ago

Obviously I did not understand something correctly.

What I had understood so far is: If I have StaticThreadSched(in 0, out 0) means that i have two threads running in on core(core0)

If I have StaticThreadSched(in 0, out 1) this means that i ask for two threads (0, 1) and with -a -j 2 in the call, this configuration runs on the cores 0 and 1.

The "conflict" comes when i tried to run click in different cores (0,1). I configure StaticThreadSched(in 2, out 3) and i thought that these means that the click will run in cores 2 and 3.

Kindly thank you for your input and advices @gkatsikas

gkatsikas commented 3 years ago

No, the thread index in StaticThreadSched does not necessarily correspond to a physical CPU core ID, it is simply a thread count. To have full control on how to pin those threads to a physical core, I suggest that you use the DPDK-based FastClick instead of Click.

p4pe commented 3 years ago

Ok.. Thanks for the explanation. I want to use Click first.

So what do you suggest for better management of core pinning? Maybe the use "taskset" command ?

If I want 2 threads in one core i will have StaticThreadSched(in 0, out 0) Click -j 2 and then taskset -cp 2 PID or something like this

tbarbette commented 3 years ago

Your first two points were correct. I think what you missed is that a Click thread can run multiple elements. It's like user-level threads. So with

StaticThreadSched(in 0, out 0)

You pin the two elements to thread 0. As you pass -a, threads 0 means core 0. Core 1 is there but does nothing. Similarly you can pass -j 4 and assign thread 2 and 3 to the in and out elements. 0 and 1 will do nothing. It is not correct to assign elements to thread 2 and 3 if you launched click with 2 threads, as 3 is an out of bound index. That is the error you get.

Taskset will not work because if two elements are on the same thread there is nothing you can do about it.

For completeness, -a takes a parameter that allows to offset the assignation of threads to core. With -a 2, thread 0 will be pinned to core 2, while thread 1 to core 3. So in that case you would pin in and out to thread 0 and 1 which will be running on core 2 and 3.

What DPDK gives is the ability to further define a list of core so if you pass, 3,7,10 thread 0 would be pinned to core 3, 1 to 7 and 2 to 10.

My suggestion would be to run click with -a -j 16 if you have 16 cores and never think about this anymore. You pin elements to thread indexes that are exactly cores. if a core has nothing assigned to it then it won't run anything, you don't care really...

p4pe commented 3 years ago

Ok now I think that I get it.

If i want to run FromDevice and ToDevice in two different threads, and assign these threads to different cores that are on different sockets.

If we assume that core2 and core4 are on different socket. The configuration will be StaticThreadSched(in 2, out 4)

And with click -a -j 16 I will have what I want.

Thank you @tbarbette

gkatsikas commented 3 years ago

You configuration should work even with -j 5. Note that pinning a FromDevice to socket 0 and ToDevice to socket 1 will imply inter-socket communication, which is costly in terms of performance. In your case it does not matter though, as you use the vanilla Click, which can hardly stress QPI.

p4pe commented 3 years ago

I know @gkatsikas, this performance degradation I want to observe!

I have 4 different scenarios

1) the FromDevice and ToDevice running in the same core without hyperthreading 2) in the same core with ht 3)Different core in the same socket 4) Diffent core different socket.

If Im right the (4) scenario will have the worst performance(more ore less) due to the inter-socket communication

gkatsikas commented 3 years ago

Yes, this is likely the case, although QPI effects may be obscured by some artefacts of your setup, such a mem copy from/to user-space. You may also try kernel-based Click or fast user-space Click (with DPDK) to eliminate this overhead.

p4pe commented 3 years ago

Unfortunately I did not manage to install kernel-based Click(I think that is not compatible with new linux headers). Next step is to try Click with DPDK, but first I have to take a look at DPDK cause I am newbie.

Last question just for confirmation. For my first scenario I just have

in::FromDevice(enp6s0f1); out::ToDevice(enp6s0f0); in -> Queue -> out;

without StaticThreadSchead

And just run click forwarder.click

gkatsikas commented 3 years ago

Yes (provided that you have disabled HT)

tbarbette commented 3 years ago

I'd say to always pin them, even for case 1.

Also you have to consider that without DPDK you're not pinning actually most of the RX work. Packets will be received by the kernel on probably all cores (the default for the NIC is to use as many queues as cores) through the interrupts handler, no matter how the application is pinned. They will go through the kernel stack on all those cores, this is some heavy work, before the app reads the packets from a single given core. Therefore if you really want to test QPI with the kernel sockets, you'll need to consider the number of queues (ethtool -L) and irq affinity.

Just a thought : similarly as your device is attached to a specific CPU, the packets will actually never be moved to the second core in the setup you present, just the Click metadata. You may want to "touch" the bytes on the second CPU. "CheckIPHeader -> SetTCP(or UDP)Checksum should do the trick.

p4pe commented 3 years ago

Thank you all for your help guys.

I (believe) that i manage to install the fastclick and know I will try to run the same "experiment" and see the difference.

if I understood well the only changes that I have to do is to replace ToDevice and FromDevice with ToDPDKDevice and FromDPDK device, with the interfaces that are binded with the DPDK, and after I run click with click --dpdk .

tbarbette commented 3 years ago

Mostly yes. You don't need a Queue also ;)

Memtwo commented 3 years ago

I'm following the issue because I also have to study a very similar case. My setup is as follows: I have three different nodes in the order below, each running a different click configuration Node1 (source) --> Node2 --> Node3 (sink)

Node1: Source.click

FastUDPSource(800000, 10000000, 60, 3c:fd:fe:04:64:42, 192.168.6.2, 1234

14:18:77:26:68:15, 192.168.6.5,? 1234) -> ToDevice(eth1);

Node2: in::FromDevice(eth1); out::ToDevice(eth2); in -> Queue -> out; StaticThreadSched(in 1, out 0);

Node3: FromDevice(eth1) -> c:: Counter - > Discard DriverManager(wait 45s, save c.rate -, stop)

In node2 we are running click with the command click -j 2 forwarder.click and the approximate rate at node3 is around 360619,21

Whereas running it with click -a -j 2 the rate is 467960,45

What is the actual difference between these two commands? I mean how does triggering the affinity switch work and why does it change the rate?

Also trying to use both the affinity and thread switches like this: click - a 2 -j 2 forwarder.click Returns the same error, "router configuration specified twice" whereas if the affinity switch is given but left empty, i.e click -a -j 2 forwarder.click It runs normally

How would I go about pinning the threads in two cores a) of a cpu in one socket b) different sockets

Kindly thank you @tbarbette @ahenning and @gkatsikas for your input

Hi. I am studying a similar case like yours. I want to ask whether the SRCETH is node1's ethernet address, SRCIP is node1's IP address, DSTETH is node2's ethernet address,DSTIP is node2's IP address in FastUDPSource?

p4pe commented 3 years ago

Hello, in my case SRCETH and SRCIP are the Mac and the IP of node one, but DSTETH and DSTIP are on the node3(sink) .

My topology is source--->VNF--->sink

Memtwo commented 3 years ago

Hello, in my case SRCETH and SRCIP are the Mac and the IP of node one, but DSTETH and DSTIP are on the node3(sink) . My topology is source--->VNF--->sink Στις Σάβ, 16 Ιαν 2021, 10:32 π.μ. ο χρήστης Memtwo notifications@github.com έγραψε: I'm following the issue because I also have to study a very similar case. My setup is as follows: I have three different nodes in the order below, each running a different click configuration Node1 (source) --> Node2 --> Node3 (sink) Node1: Source.click FastUDPSource(800000, 10000000, 60, 3c:fd:fe:04:64:42, 192.168.6.2, 1234 14:18:77:26:68:15, 192.168.6.5,? 1234) -> ToDevice(eth1); Node2: in::FromDevice(eth1); out::ToDevice(eth2); in -> Queue -> out; StaticThreadSched(in 1, out 0); Node3: FromDevice(eth1) -> c:: Counter - > Discard DriverManager(wait 45s, save c.rate -, stop) In node2 we are running click with the command click -j 2 forwarder.click and the approximate rate at node3 is around 360619,21 Whereas running it with click -a -j 2 the rate is 467960,45 What is the actual difference between these two commands? I mean how does triggering the affinity switch work and why does it change the rate? Also trying to use both the affinity and thread switches like this: click - a 2 -j 2 forwarder.click Returns the same error, "router configuration specified twice" whereas if the affinity switch is given but left empty, i.e click -a -j 2 forwarder.click It runs normally How would I go about pinning the threads in two cores a) of a cpu in one socket b) different sockets Kindly thank you @tbarbette https://github.com/tbarbette @ahenning https://github.com/ahenning and @gkatsikas https://github.com/gkatsikas for your input Hi. I am studying a similar case like yours. I want to ask whether the SRCETH is node1's ethernet address, SRCIP is node1's IP address, DSTETH is node2's ethernet address,DSTIP is node2's IP address in FastUDPSource? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#467 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AINNSS7PY5EEQB64JON5ZY3S2FFJTANCNFSM4SYKBEOA .

Thanks! But what script runs on your node2(VNF)? Simple in -> Queue -> out ? How does the packet sent to node2 by node1, then sent to node3 by node 2? Sorry I'm newbie to click and those network problems.

p4pe commented 3 years ago

Yes just FD->Queue->TD.

You have to enable promiscious mode to in and out interfaces in order all the traffic to be able to pass.

Memtwo commented 3 years ago

Yes just FD->Queue->TD.

You have to enable promiscious mode to in and out interfaces in order all the traffic to be able to pass.

But it seems my packets are directly sent to node3 by node1 and ignore node2.

p4pe commented 3 years ago

If you run tcpdump on ingress port of node2, what did you take?

Memtwo commented 3 years ago

Oh use tcpdump can see the packet from node1 to node3 I think it works. Thank you very much!

p4pe commented 3 years ago

You are welcome! You can also use the IPPrint element, for checking.

Memtwo commented 3 years ago

sorry for trouble you after a long time. I am still newbie to Click and doubt my setup doesn't work well here's my setup: Node1 (source) --> Node2(forward) --> Node3 (sink) I want my packets are transmitted hop by hop

Three nodes' information is as follows: Node1 ens33:192.128.32.128 00:0c:29:92:68:92 ; ens38:192.168.32.129 00:0c:29:92:68:9c Node2 ens33:192.168.32.130 00:0c:29:57:e6:e1 ; ens38:192.168.32.131 00:0c:29:57:e6:eb Node3 ens33:192.168.32.132 00:0c:29:db:6e:56 ; ens38:192.168.32.133 00:0c:29:db:6e:60

Node1: Source.click

FastUDPSource(800000, -1, 60, 00:0c:29:92:68:92, 192.168.32.128, 1234, 00:0c:29:db:6e:56, 192.168.32.132, 1234) ->IPPrint("Hello") ->ToDevice(ens33);

Node2: Forward.click

FromDevice(ens38, PROMISC true) -> Queue -> IPPrint -> ToDevice(ens33);

Node3: Sink.click

FromDevice(ens33, PROMISC true) -> c:: Counter -> Discard; Script(wait 10, print c.rate, loop);

I can see the result on node3. I use tcpdump on node2 and see the packet 192.168.32.128.1234 > 192.168.32.132.1234, but use IPPrint element see nothing. How can I make sure that the packet are sent to node2 from node1, then sent to node3 from node2. Sorry again for bothering you.

p4pe commented 3 years ago

Hello, I think you have to rewrite the mac in every node that a packet arrives.

Memtwo commented 3 years ago

Thanks! Do you mean I should set node1's DST to node2, then on node2 use EtherRewrite element and set it's DST to node3?