google / kernel-sanitizers

Linux Kernel Sanitizers, fast bug-detectors for the Linux kernel
https://google.github.io/kernel-sanitizers/
442 stars 87 forks source link

Support ARM64 #123

Closed melver closed 4 years ago

melver commented 4 years ago

Add support for ARM64.

It appears that the static pool cannot be used on ARM64 trivially, because it doesn't set up valid pages for statically allocated memory (at least virt_to_page() always returned an invalid page pointer). At some future point, we may want to try and understand if we can force it to give us some pages.

I will try to research more what exactly happens with .bss memory on ARM64 for the commit message. If you know what's happening, please let me know.

All tests pass in a VM (except the test_gfpzero is a bit flaky due to it not getting the same object back)!

(I'll wait to push this until your API documentation changes are in the repo, after which I'll do a rebase, as there are some conflicts.)

melver commented 4 years ago

I think having loads at malloc/free fast paths is basically a no-go for Android. We can look into allocating the pool at a static address, same as KASAN does.

It's only the free paths. Also, do notice that we call kfence_free() in a free slow-path (per stat(s, FREE_SLOWPATH) above the kfence_free()).

I can add a big TODO in ARM64's Kconfig, so we can resolve this. I'd still like to merge this if there is nothing else wrong with it, because it at least gives us something to play with. We can even benchmark it and see what the performance situation is on a real device.

ramosian-glider commented 4 years ago

Fair enough. Let me take another look at the patches.

melver commented 4 years ago

I think having loads at malloc/free fast paths is basically a no-go for Android. We can look into allocating the pool at a static address, same as KASAN does.

It's only the free paths. Also, do notice that we call kfence_free() in a free slow-path (per stat(s, FREE_SLOWPATH) above the kfence_free()).

I can add a big TODO in ARM64's Kconfig, so we can resolve this. I'd still like to merge this if there is nothing else wrong with it, because it at least gives us something to play with. We can even benchmark it and see what the performance situation is on a real device.

One more thing: I noticed in the old implementation that the pool pointer was likely on the same cacheline as the spinlock (they were right next to each other) which certainly is unfavorable due to cache-line ping-pong. This new version uses __read_mostly for the pool pointer which avoids this. So, I do suggest re-benchmarking this version on a real ARM device if it turns out using a static pool somehow doesn't work due to inherent limitations of how pages are set up for .bss data (hopefully not).