xdp-project / xdp-tutorial

XDP tutorial
2.43k stars 574 forks source link

Advanced 03 AF_XDP Invalid Argument #78

Open eloydegen opened 5 years ago

eloydegen commented 5 years ago

I'm trying to get the Advanced03_AF_XDP running in Fedora, which runs kernel 5.3.0.rc6.

CONFIG_XDP_SOCKETS=y is correctly configured.

I'm running the following commands as root:

cd advanced03-AF_XPD
make
t setup --name veth-adv03

The last command results in 100% packet loss when running ping. Same happens then for t ping of course.

./af_xdp_user -d veth-adv03

This prints the following error: ERROR: Can't create umem "Invalid argument"

Any clue how I can fix this? I have also tried compiling it on the released VM, but that version did not include Advanced 3 and I'm not able to compile the new code I pulled. I would appreciate any pointer! :)

tohojo commented 5 years ago

Eloy notifications@github.com writes:

I'm trying to get the Advanced03_AF_XDP running in Fedora, which runs kernel 5.3.0.rc6.

CONFIG_XDP_SOCKETS=y is correctly configured.

I'm running the following commands as root:

cd advanced03-AF_XPD
make
t setup --name veth-adv03

The last command results in 100% packet loss when running ping. Same happens then for t ping of course.

./af_xdp_user -d veth-adv03

This prints the following error: ERROR: Can't create umem "Invalid argument"

Any clue how I can fix this?

Hmm, this seems different from the other permission errors we've seen. @chaudron, any idea what's up with this? :)

chaudron commented 5 years ago

I've not seen this before. I assume you use the libbpf from the tutorial, if so can you try the one from your kernel/distribution?

Also, can you try to debug what is failing, as the libbpf API has several failure points, xsk_page_aligned()/mmap()/setsockopt(), etc. etc.

eloydegen commented 5 years ago

Oh I should note that this is the beta version of Fedora, but I would argue this is better than the current stable release (kernel 5.0) combined with the mainline kernel.

I Installed a new VM, the ping now works but the second error persist.

I have installed libbpf-devel and pointed the LIBBPF_DIR variable in the /advanced03-AF_XDP/Makefile to /usr/include/bpf, but then it can't build. It does build fine in the default setup.

Creating a printf statement at the top of the main function in af_xdp_user.c doesn't show it, so I'm not sure how to debug this further.

eloydegen commented 5 years ago

libbpf-devel package in Fedora does not include all the files that are currently in the /libbpf/src folder, they come from the Linux kernel source. I pointed to Makefile variable to the folder in the Linux source and compiling works. Running t ping again results in 100% packet loss.

tohojo commented 5 years ago

Eloy notifications@github.com writes:

libbpf-devel package in Fedora does not include all the files that are currently in the /libbpf/src folder, they come from the Linux kernel source.

The libbpf-devel package is supposed to contain everything. If it doens't, please file a bug (although I think there may be a new version of the libbpf package coming soon, so it may fix itself at that point).

I pointed to Makefile variable to the folder in the Linux source and compiling works. Running t ping again results in 100% packet loss.

Are you seeing any output from the af_xdp_user command? You're not actually supposed to get any ping replies while running the initial example...

eloydegen commented 5 years ago

You're not actually supposed to get any ping replies while running the initial example...

Oh. The first time I ran it on Ubuntu, the ping worked. Interesting.

The invalid argument is coming from an munmap syscall, but I'm still clueless what the actual problem is. I have attached the strace log

tohojo commented 5 years ago

Eloy notifications@github.com writes:

You're not actually supposed to get any ping replies while running the initial example...

Oh. The first time I ran it on Ubuntu, the ping worked. Interesting.

The invalid argument is coming from an munmap syscall, but I'm still clueless what the actual problem is. I have attached the strace log

No, I think it's coming from the preceding mmap:

mmap(NULL, 8374384, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x180000000) = -1 EINVAL (Invalid argument)

The munmap is libbpf's attempt at cleaning up in the error path (which also fails for some reason).

Looking at the kernel code, I guess it's either failing this check:

if (size > (PAGE_SIZE << compound_order(qpg)))
    return -EINVAL;

or remap_pfn_range() returns -EINVAL. Which can also happen, I guess, but not sure if it is in this case.

Hmm, maybe try making the buffer smaller? Just decrease NUM_FRAMES at the top of af_xdp_user.c and recompile...

eloydegen commented 5 years ago

No, I think it's coming from the preceding mmap:

Missed that one, it's the earliest error indeed.

Hmm, maybe try making the buffer smaller? Just decrease NUM_FRAMES at the top of af_xdp_user.c and recompile...

I decreased it to 64 from the original 4096, still the same error.

I compiled it with the Linux mainline source as well with the library code in this repository, that doesn't make a difference.

tohojo commented 5 years ago

Eloy notifications@github.com writes:

No, I think it's coming from the preceding mmap:

Missed that one, it's the earliest error indeed.

Hmm, maybe try making the buffer smaller? Just decrease NUM_FRAMES at the top of af_xdp_user.c and recompile...

I decreased it to 64 from the original 4096, still the same error.

I compiled it with the Linux mainline source as well with the library code in this repository, that doesn't make a difference.

Hmm, right, that's odd. No idea what's failing now. I'll try to ping some of the upstream AF_XDP devs and point them here, let's see if they have any ideas...

eloydegen commented 5 years ago

Thanks! I just subscribed to xdp-newbies and bpf on the Linux Kernel mailinglist, so I hope you're sending it there.

magnus-karlsson commented 5 years ago

You are likely running a too new libbpf on an older kernel. In 5.4-rcX, there is a new feature that changes the size of the offset struct. An old libbpf or app can run on any kernel, but a new libbpf cannot run on an old kernel. Something that should be supported? In the mean time, just use an older libbpf (from 5.3), or a newer kernel :-).

magnus-karlsson commented 5 years ago

Actually, this should be fixed in libbpf. Will submit a patch. Thanks for detecting this.

tohojo commented 5 years ago

Magnus Karlsson notifications@github.com writes:

You are likely running a too new libbpf on an older kernel. In 5.4-rcX, there is a new feature that changes the size of the offset struct. An old libbpf or app can run on any kernel, but a new libbpf cannot run on an old kernel. Something that should be supported? In the mean time, just use an older libbpf (from 5.3), or a newer kernel :-).

Wait, isn't libbpf supposed to be backwards-compatible with older kernels as well?

tohojo commented 5 years ago

Magnus Karlsson notifications@github.com writes:

Actually, this should be fixed in libbpf. Will submit a patch. Thanks for detecting this.

Great, thanks!

magnus-karlsson commented 5 years ago

Magnus Karlsson notifications@github.com writes: You are likely running a too new libbpf on an older kernel. In 5.4-rcX, there is a new feature that changes the size of the offset struct. An old libbpf or app can run on any kernel, but a new libbpf cannot run on an old kernel. Something that should be supported? In the mean time, just use an older libbpf (from 5.3), or a newer kernel :-). Wait, isn't libbpf supposed to be backwards-compatible with older kernels as well?

Do not know. I just thought about all the support tickets I would get if I do not fix this right now :-).

tohojo commented 5 years ago

Magnus Karlsson notifications@github.com writes:

Do not know. I just thought about all the support tickets I would get if I do not fix this right now :-).

Hehe, right. Well, we're just going to keep reporting any compatibility issues to you so you also have to deal with those, then ;)

eloydegen commented 5 years ago

Has the patch been submitted already, so I can try to build it again? Or does it need more time?

magnus-karlsson commented 5 years ago

On Mon, Sep 23, 2019 at 10:30 AM Eloy notifications@github.com wrote:

Has the patch been submitted already, so I can try to build it again? Or does it need more time?

It needs more time since I am travelling to Kernel Recipes this week. I will let you know as soon as it is finished.

/Magnus

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/xdp-project/xdp-tutorial/issues/78?email_source=notifications&email_token=AASGUEJK6WOG3BVK6742GU3QLB5CBA5CNFSM4IYI6V52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7KENGA#issuecomment-534005400, or mute the thread https://github.com/notifications/unsubscribe-auth/AASGUEJDZNQQS4GFHIIVHETQLB5CBANCNFSM4IYI6V5Q .

eloydegen commented 5 years ago

Thanks for the quick response, I will await it.

magnus-karlsson commented 4 years ago

Eloy,

Could you please provide me with your full name and mail address? I would like to give you credit on the patch with a Reported-by tag as you found this issue.

eloydegen commented 4 years ago

Yes, that is Eloy Degen degeneloy@gmail.com

Thanks for the fix and attribution.

magnus-karlsson commented 4 years ago

Sent you a patch that it would be great if you could try out. Note that samples/bpf does not build at the moment in bpf/master, so I applied the patch to an old need_wakeup development branch, then launched a standard Linux 5.3 that does not have need_wakeup support. The sample/libbpf compiled with need_wakeup runs as expected on that kernel without the support.