snabbco / snabb

Snabb: Simple and fast packet networking
Apache License 2.0
2.96k stars 298 forks source link

IOMMU requirements are too strict #455

Open lukego opened 9 years ago

lukego commented 9 years ago

Snabb Switch needs to play nicer with the IOMMU.

The Linux kernel has three IOMMU modes available:

Currently Snabb Switch works with off and passthrough but not with on. This should be fixed so that we work with all three. This is especially important because different OS distributions have different default settings for the IOMMU.

One idea would be to write an extremely minimal Lua routine that uses ljsyscall to enable our DMA memory with VFIO when the IOMMU is enabled. If we map the physical addresses to the IOMMU then no other code changes should be needed. It would be sufficient to map all of the HugeTLB memory that we allocate into all of the PCI devices. This could potentially be quite simple and neat.

I have posted a question on Stack Overflow to fish for more ideas: http://stackoverflow.com/questions/29985296/linux-userspace-dma-with-iommu-on-and-without-vfio

And here is the commit that removed our previous full-scale implementation of VFIO: 773b60c65a13fe86a3446330343cc0befe44d580.

CC @justincormack @javierguerragiraldez for ideas.

lukego commented 9 years ago

Additional thoughts:

The kernel behavior seems to be changing over time. If I recall correctly, on Ubuntu 13.04 the IOMMU was disabled by default, and if it was enabled then Snabb Switch could not perform DMA (and you could see a syslog message saying that it had been blocked). In Ubuntu 14.04 the IOMMU is now enabled by default, and Snabb Switch actually does seem to work to a first approximation, but we have seen major performance disruption specifically when the IOMMU is enabled (at least with the Solarflare NIC: also Intel I think).

So for me it is not yet clear what is the intended behavior of the kernel on recent Ubuntu releases. Is everything supposed to Just Work even with the IOMMU enabled? Or is it a bug that it ever works at all when not using VFIO? This could be worth chasing up and potentially even helping to fix things in the kernel if the interface is supposed to work for our current usage.

justincormack commented 9 years ago

This mail suggests that the performance issue is only with Sandy Bridge machines, more recent ones should be fine, not sure what you are testing on

http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/7409

lukego commented 9 years ago

Thank you for bringing this up Justin!

I know that Sandy Bridge has performance issues in the IOMMU but I had not considered that this could be the root cause of the problems that we have seen. This has to be checked. I will enable the IOMMU on our new Haswell server (interlaken) and see if I can reproduce the performance problem there.

lukego commented 9 years ago

Ho ho ho...

Turns out that the IOMMU is already enabled on interlaken and apparently always has been. There has been significant optimization work done on this server, both by me and @alexandergall, and it has seemed to behave very well indeed throughout.

I suppose that we can close this issue by updating our recommendation of disabling the IOMMU to only apply to Sandy Bridge servers, and to post a note asking people to please report any problems with IOMMU on other platforms?

lukego commented 9 years ago

@justincormack any idea what changed so that enabled IOMMU blocks us on Ubuntu 13.04 (iirc) but not 14.04?

justincormack commented 9 years ago

No, I can't find any specific thing that would explain it. There seem to be a lot of workarounds and fixes for specific hardware, but nothing general, so maybe it is machine/pci device specific?

It might perhaps be best to recommend a 3.16 or later kernel for iommu as a placeholder.