Open Adarsh97 opened 2 years ago
@magnus-karlsson @tohojo @netoptimizer @dmitris @davem330 any idea, is this a bug on our frame work ?
What NICs and kernel version are you using?
Os version : RHEL 8.4 NIC : IXGBE
On Wed, Nov 24, 2021, 21:21 Magnus Karlsson @.***> wrote:
What NICs and kernel version are you using?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xdp-project/xdp-tutorial/issues/263#issuecomment-978003234, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLUHWIK6BMQM4VI5R6SMFLUNUCYLANCNFSM5IWIQZJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
OS: RHEL 8.4 10G driver : IXGBE 1G driver: i40e
On Wed, Nov 24, 2021, 21:30 Adarsh Sunilkumar @.***> wrote:
Os version : RHEL 8.4 NIC : IXGBE
On Wed, Nov 24, 2021, 21:21 Magnus Karlsson @.***> wrote:
What NICs and kernel version are you using?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xdp-project/xdp-tutorial/issues/263#issuecomment-978003234, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLUHWIK6BMQM4VI5R6SMFLUNUCYLANCNFSM5IWIQZJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Could you please try invoking sendto() after the completion queue gets stuck. What kernel version is RHEL 8.4?
Kernel version : 4.18.0-305 I am doing sendto(sockid, 0.., Don't wait,) just before checking the completion ring, as specified in the xdp-sockuser.c program of linux kernel.
On Wed, Nov 24, 2021, 22:38 Magnus Karlsson @.***> wrote:
Could you please try invoking sendto() after the completion queue gets stuck. What kernel version is RHEL 8.4?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xdp-project/xdp-tutorial/issues/263#issuecomment-978068333, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLUHWJPEMUGYNCP2CFJMSDUNULZHANCNFSM5IWIQZJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
4.18 is really old! A lot of things has happened since then. Could you try bleeding edge bpf-next and see if you can produce the issue there?
I have tried the same code in ubuntu 20.04 which is using kernel version 5.4 , I am seeing the same issue there also.
On Wed, Nov 24, 2021, 22:58 Magnus Karlsson @.***> wrote:
4.18 is really old! A lot of things has happened since then. Could you try bleeding edge bpf-next and see if you can produce the issue there?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xdp-project/xdp-tutorial/issues/263#issuecomment-978083967, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLUHWIFDOAMZ5YIONOXGIDUNUODNANCNFSM5IWIQZJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Please note that I am not doing any polling. Polling will reduce throughput, right ? So I don't want to use it. I can see a polling option in xdp-sock-user.c in linux/kernel
On Wed, Nov 24, 2021, 23:00 Adarsh Sunilkumar @.***> wrote:
I have tried the same code in ubuntu 20.04 which is using kernel version 5.4 , I am seeing the same issue there also.
On Wed, Nov 24, 2021, 22:58 Magnus Karlsson @.***> wrote:
4.18 is really old! A lot of things has happened since then. Could you try bleeding edge bpf-next and see if you can produce the issue there?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xdp-project/xdp-tutorial/issues/263#issuecomment-978083967, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLUHWIFDOAMZ5YIONOXGIDUNUODNANCNFSM5IWIQZJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Polling is optional, so no problem. But please try latest bpf-next. 5.4 is more than 2 years old.
The kernel version in RHEL is a complete fiction; we backport everything BPF... :)
On 24 November 2021 18:28:22 CET, Magnus Karlsson @.***> wrote:
4.18 is really old! A lot of things has happened since then. Could you try bleeding edge bpf-next and see if you can produce the issue there?
Any chance of cache update issue ? Have you guys ever encountered this issue? Bpf-next, xdp-sockuser example I have looked into, I am not seeing any difference from the current code I am having.
On Wed, Nov 24, 2021, 23:55 Toke Høiland-Jørgensen @.***> wrote:
The kernel version in RHEL is a complete fiction; we backport everything BPF... :)
On 24 November 2021 18:28:22 CET, Magnus Karlsson @.***> wrote:
4.18 is really old! A lot of things has happened since then. Could you try bleeding edge bpf-next and see if you can produce the issue there?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xdp-project/xdp-tutorial/issues/263#issuecomment-978123329, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLUHWI5XTZQI3EDWKTDOU3UNUUYBANCNFSM5IWIQZJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
I have not seen this before, but that does not mean it is not real ;-). The user app has not changed much since 5.4. The kernel together with the driver has changed a lot since 5.4 though. That is what I am interested in so please try out bpf-next. If you can reproduce it there, I can set up something similar on my end and debug it.
With sendto option completion ring is getting updated but packets are not sending. Any way to debug this part ?
On Thu, Nov 25, 2021, 00:29 Magnus Karlsson @.***> wrote:
I have not seen this before, but that does not mean it is not real ;-). The user app has not changed much since 5.4. The kernel together with the driver has changed a lot since 5.4 though. That is what I am interested in so please try out bpf-next. If you can reproduce it there, I can set up something similar on my end and debug it.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xdp-project/xdp-tutorial/issues/263#issuecomment-978144595, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLUHWL2HOVQCV4HXKUWALDUNUY2NANCNFSM5IWIQZJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
bpftrace is your best friend. You can find a good tutorial here:
https://www.brendangregg.com/blog/2019-01-01/learn-ebpf-tracing.html
You need to scroll down one or two pages to find bpftrace.
So for sending packets also bpf_trace will be helpful ?
Another observation is the producer and consumer pointer of completion ring is increasing ever, ideally it should be get modulo value with its size, right ?
On Thu, Nov 25, 2021, 17:40 Magnus Karlsson @.***> wrote:
bpftrace is your best friend. You can find a good tutorial here:
https://www.brendangregg.com/blog/2019-01-01/learn-ebpf-tracing.html
You need to scroll down one or two pages to find bpftrace.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xdp-project/xdp-tutorial/issues/263#issuecomment-979152890, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLUHWN65SRJYRMN4YOS7Y3UNYRTLANCNFSM5IWIQZJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
The counters are ever increasing so we do not have to test for any wrap around. Improves performance.
bpftrace is useful for debugging. Sorry, but do not understand the "sending packets" comment in regards to bpftrace.
Please check the statistics with XDP_STATISTICS getsockopt to see that you do not have any errors that might explain why you are not seeing packets. Are you running in SKB mode or in zero-copy mode?
I am running in zero copy mode.
On Thu, Nov 25, 2021, 19:27 Magnus Karlsson @.***> wrote:
Please check the statistics with XDP_STATISTICS getsockopt to see that you do not have any errors that might explain why you are not seeing packets. Are you running in SKB mode or in zero-copy mode?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xdp-project/xdp-tutorial/issues/263#issuecomment-979236847, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLUHWMQKAWMKZIBJWFYVG3UNY6GHANCNFSM5IWIQZJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
OK. Just wondered because in skb mode there are cases in which the packet will be dropped by the driver so you would have to watch the return values of sendto(). Do you get any pause frames returned from the 1G NIC to the 10G one? Is autoneg off a good idea when you have different speeds?
Sometimes no packets are sending. When I give a sleep it will send packet or if I send large number of packets then also packets are sending.
On Thu, Nov 25, 2021, 19:51 Magnus Karlsson @.***> wrote:
OK. Just wondered because in skb mode there are cases in which the packet will be dropped by the driver so you would have to watch the return values of sendto(). Do you get any pause frames returned from the 1G NIC to the 10G one? Is autoneg off a good idea when you have different speeds?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xdp-project/xdp-tutorial/issues/263#issuecomment-979254457, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLUHWOFALVCHBKTW7G6S2LUNZA5VANCNFSM5IWIQZJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
The counters are ever increasing so we do not have to test for any wrap around. Improves performance.
bpftrace is useful for debugging. Sorry, but do not understand the "sending packets" comment in regards to bpftrace.
I was asking with the bpftrace option we can debug packet sending path also. One question is kernel will fill completion ring, only when it send a packet, right ? or are there any cases like even if kernel not sending packet, still it advance its producer pointer in completion ring. If the producer and consumer value is ever increasing means, what is the significants of size argument of completion ring provided during the creation of umem ?
If I am using a nic queue (having both rx and tx) for creating a af_xdp socket with an umem, will simultaneous reception and send of packet through the same socket will create conflict such as memory override in umem area ? Do we have a mechanism to prevent the above issue ?
On Thu, Nov 25, 2021, 20:05 Adarsh Sunilkumar @.***> wrote:
Sometimes no packets are sending. When I give a sleep it will send packet or if I send large number of packets then also packets are sending.
On Thu, Nov 25, 2021, 19:51 Magnus Karlsson @.***> wrote:
OK. Just wondered because in skb mode there are cases in which the packet will be dropped by the driver so you would have to watch the return values of sendto(). Do you get any pause frames returned from the 1G NIC to the 10G one? Is autoneg off a good idea when you have different speeds?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xdp-project/xdp-tutorial/issues/263#issuecomment-979254457, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLUHWOFALVCHBKTW7G6S2LUNZA5VANCNFSM5IWIQZJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
If I am using a nic queue (having both rx and tx) for creating a af_xdp socket with an umem, will simultaneous reception and send of packet through the same socket will create conflict such as memory override in umem area ? Do we have a mechanism to prevent the above issue ? On Thu, Nov 25, 2021, 20:05 Adarsh Sunilkumar @.> wrote: … Sometimes no packets are sending. When I give a sleep it will send packet or if I send large number of packets then also packets are sending. On Thu, Nov 25, 2021, 19:51 Magnus Karlsson @.> wrote: > OK. Just wondered because in skb mode there are cases in which the packet > will be dropped by the driver so you would have to watch the return values > of sendto(). Do you get any pause frames returned from the 1G NIC to the > 10G one? Is autoneg off a good idea when you have different speeds? > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#263 (comment)>, > or unsubscribe > https://github.com/notifications/unsubscribe-auth/AFLUHWOFALVCHBKTW7G6S2LUNZA5VANCNFSM5IWIQZJA > . > Triage notifications on the go with GitHub Mobile for iOS > https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 > or Android > https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. > >
@magnus-karlsson @tohojo can I have any explanation about how rings are handled so that, simultaneous send and receive through same socket will not create umem area memory override.
It is up to user-space to make sure that overrides do not happen. You basically need a mempool to manage he umem, like the ones that exist in DPDK and VPP. For a simple one, take a look at samples/bpf/xsk_fwd.c in the Linux repo.
For an explanation on how AF_XDP rings work, take a look here: http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf
As AF XDP is executing in zero-copy mode, ### the NIC has already put the packet in a packet buffer in the umem area so the only thing the kernel has to do is fill in the Rx descriptor to tell the application where this new packet resides and the length of it
How did the NIC decide the location of Umem, Where the packet is to be copied?
@magnus-karlsson While executing samples/bpf/xdpsock_user.c program, at the first time load the packets are not transmitting. from the second load onwards, its worked fine.
The ip link show command outputs, the xdp is jitted. What does this mean?
I think the problem has been solved. What I have observed is xdp loading in 10Gbps system is taking some time, but in 1Gbps it is not taking much time. If I am inducing a sleep of let's say 10 sec in 10 Gbps system, then I am able to send and receive packets. I am not sure this delay in loading xdp on interface is an expected thing.
On Fri, Nov 26, 2021, 22:38 Jerin Paulose @.***> wrote:
The ip link show command outputs, the xdp is jitted. What does this mean?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xdp-project/xdp-tutorial/issues/263#issuecomment-980159124, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLUHWK2KRXUU2GABCUIWULUN65HPANCNFSM5IWIQZJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
As AF XDP is executing in zero-copy mode, ### the NIC has already put the packet in a packet buffer in the umem area so the only thing the kernel has to do is fill in the Rx descriptor to tell the application where this new packet resides and the length of it
How did the NIC decide the location of Umem, Where the packet is to be copied?
Interesting question @jerinpauloseme. @magnus-karlsson @tohojo can I get an answer for the above doubt ?
I think the problem has been solved. What I have observed is xdp loading in 10Gbps system is taking some time, but in 1Gbps it is not taking much time. If I am inducing a sleep of let's say 10 sec in 10 Gbps system, then I am able to send and receive packets. I am not sure this delay in loading xdp on interface is an expected thing. … On Fri, Nov 26, 2021, 22:38 Jerin Paulose @.**> wrote: The ip link show* command outputs, the xdp is jitted. What does this mean? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#263 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLUHWK2KRXUU2GABCUIWULUN65HPANCNFSM5IWIQZJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
@magnus-karlsson @tohojo what is your opinion about it ?
How did the NIC decide the location of Umem, Where the packet is to be copied?
The buffers you put in the fill ring decides this. Please see the paper I linked to previously.
I think the problem has been solved. What I have observed is xdp loading in 10Gbps system is taking some time, but in 1Gbps > it is not taking much time. If I am inducing a sleep of let's say 10 sec in 10 Gbps system, then I am able to send and receive packets. I am not sure this delay in loading xdp on interface is an expected thing.
Takes around 200 ms on my large server for the i40e driver. Do not know why it takes so long on your system.
I couldn't find much information from the paper you have shared. Do you have any other resources ?
On Mon, Nov 29, 2021, 01:32 Magnus Karlsson @.***> wrote:
How did the NIC decide the location of Umem, Where the packet is to be copied?
The buffers you put in the fill ring decides this. Please see the paper I linked to previously.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xdp-project/xdp-tutorial/issues/263#issuecomment-981142961, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLUHWPGILE4SGQUMWVYGOLUOKDEZANCNFSM5IWIQZJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Ok.
On Mon, Nov 29, 2021, 01:34 Magnus Karlsson @.***> wrote:
I think the problem has been solved. What I have observed is xdp loading in 10Gbps system is taking some time, but in 1Gbps > it is not taking much time. If I am inducing a sleep of let's say 10 sec in 10 Gbps system, then I am able to send and receive packets. I am not sure this delay in loading xdp on interface is an expected thing.
Takes around 200 ms on my large server for the i40e driver. Do not know why it takes so long on your system.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/xdp-project/xdp-tutorial/issues/263#issuecomment-981143219, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFLUHWOQPKQ3HB7I2RCNYZLUOKDLNANCNFSM5IWIQZJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
There is also Documentation/networking/af_xdp.rst in the Linux source code repo.
I am having an interesting observation. I am trying to send packet with the help of af_xdp socket. The code is working well when I am sending packet from a 1Gbps system to 1 Gbps, as well as 10Gbps to 10 Gbps. But packets are not sending from 10Gbps to 1Gbps system. Sometimes packets are getting transmitted. When I am checking completion ring it's producer point is not getting advanced, seems like kernel is not sending packet. When I am inducing a delay with sleep function packets are sending. Is this some cache issue, or kernel is not active all the time ? I have made below parameter tuning
Autoneg off rx off tx off
Rx- usecs 0