3.7.1: memory leak: getname()

guyharris commented 11 years ago

Converted from SourceForge issue 604464, submitted by nobody

hi,

while observing millions of mostly broken udp, tcp and icmp payloads inside mostly broken ipv4 packets, tcpdump-3.7.1 kept allocating more and more memory until it had occupied a huge 244MB memory space.

my analysis is that there is some memory forgotten to free in case of broken packets.

libpcap version 0.7.

regards, -tom

guyharris commented 11 years ago

Submitted by guy_harris

Logged In: YES user_id=541179

What happens if you run with the "-S" flag?

The IP dissector allocates no memory except for, when running on a platform that requires strict alignment, a one-time allocation of a buffer for packets whose IP header isn't aligned on a 4-byte boundary.

The TCP dissector, however, allocates, when printing relative sequence numbers, a data structure for each TCP connection it sees (to keep track of the starting sequence numbers); bogus traffic could cause it to see lots and lots of bogus connections.

"-S" causes absolute sequence numbers to be displayed; the TCP dissector, when "-S" is specified, doesn't allocate those per-connection data structures.

guyharris commented 11 years ago

Submitted by nobody

Logged In: NO

hi,

i was running it like tcpdump -nvvvei eth0 and now i'm running it like tcpdump -nvvvSei eth0. it still seems to grow even with the -S option.

i also tried a simple modification that limits the total number of per-connection tcp data structures to 10. but it still grows.

i bet there is no possibility for recursion that would eat the memory?

-tom

guyharris commented 11 years ago

Submitted by nobody

Logged In: NO

hi,

i'm beginning to feel that this is not really tcpdump's problem but more likely the problem is in some of the libraries.

also it might be that the key is not 'broken packets' but 'the number of packets'; there are zillions of them being observed so memory consuption problems come to light more easily than under normal circumstances.

libpcap is version 0.7.1 to be more precise.

-tom

guyharris commented 11 years ago

Submitted by tom_soderlund

Logged In: YES user_id=606285

hi,

observing now the HEAD version of both tcpdump and libpcap.

i think i found it. the key was not only the number of packets, but that the packets were from random addresses: this causes the cache of intoa() return values to grow address by address until it contains the string representations of all possible ip-addresses.

i verified this by making getname() in addrtoname.c return straightly the result of intoa(addr) without caching it first.

maybe the cache should be limited in size somehow?

sorry for my rather misleading information and assumptions in the original report.

-tom

workerthread commented 10 years ago

TCPDUMP Denial-of-Service vulnerability

Hello, while investigating some performance issues with tcpdump in a high-traffic environment, we discovered a non-optimal implementation of the “getname” function in file “addrtoname.c”, which not only leads to poor performance, but also makes tcpdump vulnerable to a DoS type attack.

Before proceeding, please note that we are not regular contributors of the tcpdump project, so we are not experts about the rest of the codebase, and before submitting the patch and notify the vulnerability we would like to have your feedback.

From what we understood by analyzing the code of “addrtoname.c”, the program, in order to cache the IP Address – Hostname resolution, maintains an hashtable (hnametable), which holds lists of type “hnamemem”.

Each “hnamemem” element holds:

The IP address in unsigned int format
The char pointer to the string containing the hostname
A pointer to the subsequent hnamemem element in the list

The function “getname” in “addrtoname.c” is responsible for populating and maintaining such data structure, however we noticed that new elements are added to the “cache” even if the hostname resolution is turned off (by using the “-n” flag); in that case, the hostname field of hnamemem is populated with the string representation of the IP address (xxx.xxx.xxx.xxx).

This leads to a performance degradation and increasing memory occupation in case tcpdump is deployed in an environment where it “sees” many different IP addresses (i.e.: on a LAN with few hundreds or thousands of hosts, it is difficult to notice any performance degradation or anomalous memory usage).

In addition, since this function is called each time an IP has to be printed (i.e.: it is triggered also in “non-verbose” mode), an attacker could easily exploit for a DoS-type attack it by sending packets that sweep the entire IP address space. Please note that since this function is used almost everywhere, the attacker is not limited to crafting source and/or destination IP, but he can play also with addresses embedded inside the packet (i.e.: RADIUS packets).

In this way, he would obtain the following results:

Continuously increase the memory occupied by the tcpdump process, possibly causing it to crash, or fill the entire server memory (assuming a size of 24 bytes for each entry in the cache, sweeping the entire IPv4 address space would lead to approx. 96 GB memory size).
Increase the tcpdump CPU usage; since each time an IP address has to be printed, the program has to lookup inside a very large data structure (we observed that even with a data structure of about 40-50 MB the impact is huge), this would cause tcpdump to start dropping packets, even with moderate traffic.

We also did a quick check on the “getname6” function, and it also seems to be vulnerable (and in that case, with IPv6, only the sky is the limit for possible memory occupation :) ).

We have also successfully developed a shell script (quick&dirty) that exploits the vulnerability: it simply sweeps the IPv4 address space by sending “arping” requests, causing tcpdump to steadily increase the memory usage, and thus CPU usage too.

What we suggest in order to fix this vulnerability is: 1) Avoid instantiating and using such data structure at all, if the “-n” flag is specified. The only advantage we see in using the cache even if the hostname resolution is turned off, is avoiding the conversion of the IP address into printable format, but the experimental results show that this becomes a penalty when the cache becomes large. We believe the cache should serve just to avoid the costly DNS reverse lookups. 2) The remediation listed above, would leave tcpdump vulnerable in case the hostname lookup is turned on; in order to fix also this situation, we suggest to populate the cache only when an hostname has been successfully resolved. In this way, an attacker would not have a direct control over the cache size.

We propose the following patch, that according to our tests resolves the issue (obviously, it has been developed in a quick&dirty way, so we may have missed something, and it has to be polished before submitting it to the repository).

We look forward to receiving your feedback.

Antonio Trapanese Mattia Dossena

+++ addrtoname.c.patched        2014-05-16 15:31:19.000000000 +0200
@@ -224,18 +224,12 @@
 const char *
 getname(const u_char *ap)
 {
-       register struct hostent *hp;
+
        u_int32_t addr;
-       static struct hnamemem *p;              /* static for longjmp() */
+

        memcpy(&addr, ap, sizeof(addr));
-       p = &hnametable[addr & (HASHNAMESIZE-1)];
-       for (; p->nxt; p = p->nxt) {
-               if (p->addr == addr)
-                       return (p->name);
-       }
-       p->addr = addr;
-       p->nxt = newhnamemem();
+

        /*
         * Print names unless:
@@ -246,8 +240,21 @@
         */
        if (!nflag &&
            (addr & f_netmask) == f_localnet) {
+
+       register struct hostent *hp;
+       static struct hnamemem *p;              /* static for longjmp() */
+
+       p = &hnametable[addr & (HASHNAMESIZE-1)];
+       for (; p->nxt; p = p->nxt) {
+               if (p->addr == addr)
+                       return (p->name);
+       }
+
+
                hp = gethostbyaddr((char *)&addr, 4, AF_INET);
                if (hp) {
+                       p->addr = addr;
+                       p->nxt = newhnamemem();
                        char *dotp;

                        p->name = strdup(hp->h_name);
@@ -260,8 +267,8 @@
                        return (p->name);
                }
        }
-       p->name = strdup(intoa(addr));
-       return (p->name);
+
+       return (intoa(addr));
 }

infrastation commented 6 years ago

I acknowledge this long-standing issue, it needs to be fixed eventually (whether in the suggested way or not).

the-tcpdump-group / tcpdump

3.7.1: memory leak: getname() #13