nitturis-linux / kedr

Automatically exported from code.google.com/p/kedr
0 stars 1 forks source link

detecting sk_buff leaks #4

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hello,

I'd like KEDR to track sk_buff alloc/free. I wrote some basic code to track 
__alloc_skb, skb_clone and kfree_skb. It's not perfect because it only tracks 
struct sk_buff allocation/free, not the data buffer it points to, but that 
doesn't matter for me here.

Now here's my problem:

The sk_buff is allocated by me on the send side and given to the network. It's 
basically free'd by the network driver after send (outside of the module I am 
debugging). On the receive side, the incoming sk_buff is allocated by the 
driver before giving it to me, and I take care of freeing it.

I worked around the send side problem by declaring that dev_queue_xmit() takes 
care of freeing my sk_buff. The freeing doesn't occur in dev_queue_xmit() for 
real, but that's where my module last sees its outgoing sk_buff, so that's fine.

On the receive side, it's harder. If the Linux receive model was my module 
calling a core/driver function to get the next incoming sk_buff, I would just 
tell KEDR that this function allocates the sk_buff. But the Linux model is the 
opposite: the core/driver calls a callback in my module with the incoming skb 
as a parameter.

So I basically need a way to tell KEDR "the skb passed to my module here has 
been allocated earlier, now please track it". How can I do that?

Brice Goglin

PS: Yeah, I'll share my code if you're interested

Original issue reported on code.google.com by brice.go...@gmail.com on 22 Feb 2012 at 2:53

GoogleCodeExporter commented 9 years ago
A tool to track allocation/deallocation of sk_buff structures is a very useful 
one indeed. I think many users could benefit from it, so if you share the code, 
it would be great.

As for the problem, KEDR itself currently has no way to recognize that an 
object was allocated/freed by another module or by the kernel proper. We also 
faced this problem in our projects.

Two ideas:

1. Would it be OK for your tool if KEDR could track several kernel modules as a 
single entity? That is, if it could replace the calls in more than one module 
with the calls to your functions? This way, you could track 
allocation/deallocation of sk_buff structures both in the module you actually 
debug and in the underlying network driver as if they were a single module.

This feature is already in our TODO list but it may take some time to implement 
it. 

2. A colleague of mine and the co-author if KEDR, Andrey Tsyvarev has developed 
a system to track callback operations in the kernel modules, KEDR-COI:
http://code.google.com/p/kedr-callback-operations-interception/
So far, we have used it to track file operations, inode operations and the 
like. But it also provides the means to prepare the components to track other 
types of callbacks. 

The idea of that system is to intercept the calls to the functions that have 
registered the objects with callbacks of interest and to replace the callbacks 
in these objects.

Using KEDR-COI and KEDR at the same time, you could probably intercept the 
callback in the module under analysis that receives that newly created sk_buff 
instance and track that instance thereafter.

KEDR-COI is still experimental but it can be useful. 
The documentation for KEDR-COI is available in its package, see 
sources/doc/html/.

Original comment by euspec...@gmail.com on 23 Feb 2012 at 5:45

GoogleCodeExporter commented 9 years ago
KEDR-COI looks interesting indeed. I'll see if I can take some time to test it. 
In my case, you would have to intercept dev_add_pack() and dev_remove_pack() 
(these are where ethernet-type-specific receive handler callbacks are 
added/removed).

I am not fully convinced by idea (1). Three reasons:
* I thought that KEDR required the target module to be loaded after KEDR. My 
network drivers are usually loaded at boot time because I need them to ssh on 
the machine before I use KEDR.
* I wonder if some corner cases in the receive stack could have the core kernel 
play with sk_buff between drivers and my module.
* My network drivers also receive normal traffic such as IP. So I would need to 
either track the tcp/ip modules with KEDR, or have a way to filter TCP/IP 
sk_buff allocations out of the KEDR output.

In short, I prefer idea (2) :)

My patch is attached. If you need a copyright header, it's:
  Copyright © 2012 inria. All rights reserved.

Original comment by brice.go...@gmail.com on 23 Feb 2012 at 9:08

Attachments:

GoogleCodeExporter commented 9 years ago
Got your point concerning the first idea. Yes, analyzing several modules as a 
group seems to do little good in case of your task. And you are right, KEDR 
needs to be loaded before the target module and it cannot analyze the kernel 
proper, only the modules.

Thank you for the patch. Now I see that you have actually patched LeakCheck 
rather than created a separate plugin for KEDR as I thought before. It is all 
quite different and probably makes it harder to use KEDR-COI than if sk_buff 
operations were tracked by a separate plugin.

For some time now I have been planning to redesign LeakCheck to some extent. 
May be it would help here too. The idea is to separate its core (analysis 
engine) from the components that collect alloc/free events and to allow 
creating custom components of this kind. To put it simple:

- LeakCheck core could export API functions like 
kedr_lc_alloc_event(const void *block, size_t size, const void *caller_address, 
<...>) 
and
kedr_lc_free_event(const void *block, const void *caller_address, <...>)

Similar to what already exists in leak_check/mbi_ops.* but is not exported yet.

- Different plugins (probably, separate kernel modules) could intercept 
different groups of alloc/free routines and call kedr_lc_*_event() when 
appropriate.

- The core would analyze the data and report the results the same way as 
LeakCheck does it now.

In our work, we would benefit from this redesign too. Among other things, it 
would made much easier to use LeakCheck with KEDR-COI, which we need when we 
analyze file system modules. In addition, I do not like the current monolythic 
structure of LeakCheck. The Linux kernel ABI is ever-changing and we need to 
make sure our system is usable on all the versions and variants of the kernel 
we aim to support. A modular LeakCheck would make things easier to some extent. 
I would also like to give the user the means to choose what she would like to 
check with LeakCheck, e.g. slab allocs, page allocs, sk_buff operations like 
your tool does, drm allocs, or may be all of this at the same time, etc.

If the above is implemented, a tool to track sk_buff operations could be a 
separate plugin to KEDR, a separate kernel module. That module would use KEDR 
core API to intercept the calls to the necessary exported functions (same as it 
is now). A custom interceptor created with the help of KEDR-COI would help 
tracking the appropriate callbacks. Your plugin would call kedr_lc_*_event() 
when appropriate and LeakCheck would analyze the collected data.

Both Andrey and I are currently working on another system that I hope would 
make it into KEDR framework in the future 
(http://code.google.com/p/kernel-strider/). 
But I have some time now, so I will try to implement that LeakCheck redesign as 
I have described. 

Ideas, comments and suggestions are always welcome :-)

Original comment by euspec...@gmail.com on 24 Feb 2012 at 8:19

GoogleCodeExporter commented 9 years ago
I have revisited LeakCheck as I described above. Among other things, it now 
provides the API that other kernel modules can use to tell LeakCheck something 
like "this object has been allocated" or "this object is about to be 
deallocated". It should also be easier now to add support for tracking custom 
functions.

I have written a brief description of the API and of the examples: 
http://code.google.com/p/kedr/wiki/LeakCheck_API

If you would like to use these features, you could build KEDR from the most 
recent revision available here in the repository. It should be stable enough, 
at least, it passes our tests on a dozen of systems with different variants of 
Linux and versions of the kernel ;-)

The examples ("Mempool_ops" and "Annotations") I have also prepared could serve 
as a good starting point for your task. 

Original comment by euspec...@gmail.com on 2 Mar 2012 at 1:00

GoogleCodeExporter commented 9 years ago
Shall we call this fixed? Are there any issues here that need to be resolved?

Original comment by euspec...@gmail.com on 30 Aug 2012 at 7:47

GoogleCodeExporter commented 9 years ago
OK, marking as fixed. If anyone is having issues related to this one, feel free 
to open another bug here.

Original comment by euspec...@gmail.com on 27 Sep 2012 at 2:10