tcpdump -v -w foo.out fails to capture on Solaris

normjacobs commented 10 years ago

root@snappy:/builds/jacobs/tcpdump/components/tcpdump# tcpdump -v -w /tmp/dump.out
tcpdump: listening on net0, link-type EN10MB (Ethernet), capture size 65535 bytes
^C0 packets captured
15 packets received by filter
0 packets dropped by kernel
root@snappy:/builds/jacobs/tcpdump/components/tcpdump#

This should have captured all 15 packets received, but it didn't. Further investigation seems to indicate that this is because of the following: tcpdump sets are read() timeout for libpcap to 1 second using pcap_set_timeout(1000) tcpdump sets an alarm timer to 1 second (alarm(1)) to updated the displayed number of packets captured every second when -v and -w are used. in the bpf module, bpf_read() uses cv_timedwait_sig() to wait for the configured timeout before it returns from read(). This is always interruppted by the alarm causing read() to return EINTR. Dropping the bpf read timeout down to 950 seems to provide enough time for it to work.

--- tcpdump-4.5.1/tcpdump.c Mon Apr  7 10:34:02 2014
+++ tcpdump-4.5.1/tcpdump.c.orig    Thu Nov  7 17:22:54 2013
@@ -1286,7 +1283,7 @@
                error("%s: Can't set monitor mode: %s",
                    device, pcap_statustostr(status));
        }
-       status = pcap_set_timeout(pd, 1000);
+       status = pcap_set_timeout(pd, 950);
        if (status != 0)
            error("%s: pcap_set_timeout failed: %s",
                device, pcap_statustostr(status));

root@snappy:/builds/jacobs/tcpdump/components/tcpdump# build/amd64/tcpdump -v -w /tmp/dump.out
tcpdump: listening on net0, link-type EN10MB (Ethernet), capture size 65535 bytes
^C16 packets captured
16 packets received by filter
0 packets dropped by kernel
root@snappy:/builds/jacobs/tcpdump/components/tcpdump#

guyharris commented 10 years ago

in the bpf module, bpf_read() uses cv_timedwait_sig() to wait for the configured timeout before it returns from read(). This is always interruppted by the alarm causing read() to return EINTR.

That's not unique to Solaris, other than the names of the kernel routines used in the BPF code.

I'm not sure why this is only showing up on Solaris, but what happens if you change setsignal() to mark SIGALRM as a "restart after the interrupt" signal?

diff --git a/setsignal.c b/setsignal.c
index 1bcc032..ba53a07 100644
--- a/setsignal.c
+++ b/setsignal.c
@@ -73,7 +73,7 @@ RETSIGTYPE

    memset(&new, 0, sizeof(new));
    new.sa_handler = func;
-   if (sig == SIGCHLD)
+   if (sig == SIGCHLD || sig == SIGALRM)
        new.sa_flags = SA_RESTART;
    if (sigaction(sig, &new, &old) < 0)
        return (SIG_ERR);

and revert the timeout change?

normjacobs commented 10 years ago

On 04/07/2014 11:57 AM, Guy Harris wrote:

in the bpf module, bpf_read() uses cv_timedwait_sig() to wait for
the configured timeout before it returns from read(). This is
always interruppted by the alarm causing read() to return EINTR.
That's not unique to Solaris, other than the names of the kernel routines used in the BPF code.

I'm not sure why this is only showing up on Solaris, but what happens if you change |setsignal()| to mark SIGALRM as a "restart after the interrupt" signal?

This works.

 -Norm

|diff --git a/setsignal.c b/setsignal.c index 1bcc032..ba53a07 100644 --- a/setsignal.c +++ b/setsignal.c @@ -73,7 +73,7 @@ RETSIGTYPE
 memset(&new, 0, sizeof(new));
 new.sa_handler = func;
if (sig == SIGCHLD)

if (sig == SIGCHLD sig == SIGALRM) new.sa_flags = SA_RESTART; if (sigaction(sig, &new, &old) < 0) return (SIG_ERR);

— Reply to this email directly or view it on GitHub https://github.com/the-tcpdump-group/libpcap/issues/1258.

normjacobs commented 10 years ago

I take that back. I had left in the shorter timeout. It's still broken with setsignal() change.

 -Norm

On 04/07/2014 06:56 PM, Norm Jacobs wrote:

On 04/07/2014 11:57 AM, Guy Harris wrote:
in the bpf module, bpf_read() uses cv_timedwait_sig() to wait for
the configured timeout before it returns from read(). This is
always interruppted by the alarm causing read() to return EINTR.
That's not unique to Solaris, other than the names of the kernel routines used in the BPF code.

I'm not sure why this is only showing up on Solaris, but what happens if you change |setsignal()| to mark SIGALRM as a "restart after the interrupt" signal?

This works.
-Norm
|diff --git a/setsignal.c b/setsignal.c index 1bcc032..ba53a07 100644 --- a/setsignal.c +++ b/setsignal.c @@ -73,7 +73,7 @@ RETSIGTYPE
 memset(&new, 0, sizeof(new));
 new.sa_handler = func;
if (sig == SIGCHLD)

if (sig == SIGCHLD sig == SIGALRM) new.sa_flags = SA_RESTART; if (sigaction(sig, &new, &old) < 0) return (SIG_ERR);

— Reply to this email directly or view it on GitHub https://github.com/the-tcpdump-group/libpcap/issues/1258.

guyharris commented 10 years ago

It's still broken with setsignal() change.

My goodness, Sun^WOracle certainly seem to have failed big time here; I'm not having that problem on, for example, OS X, and if it were happening on *BSD, we would probably have heard about it. If Larry hadn't decided to kill OpenSolaris, I could perhaps have looked at the code to see what the problem is.

The Single UNIX Specification page for sigaction() says:

SA_RESTART This flag affects the behavior of interruptible functions; that is, those specified to fail with errno set to [EINTR]. If set, and a function specified as interruptible is interrupted by this signal, the function shall restart and shall not fail with [EINTR] unless otherwise specified. If an interruptible function which uses a timeout is restarted, the duration of the timeout following the restart is set to an unspecified value that does not exceed the original timeout value. If the flag is not set, interruptible functions interrupted by this signal shall fail with errno set to [EINTR].

Implementing that behavior, for character special files such as the /dev/bpf files, is a function of the driver on BSD-flavored systems. For example, the bpfread() loop in OS X Mavericks includes

            error = BPF_SLEEP(d, PRINET|PCATCH, "bpf",
                              d->bd_rtout);

...

            if (error == EINTR || error == ERESTART) {
                    if (d->bd_slen) {
                            /*
                             * Sometimes we may be interrupted often and
                             * the sleep above will not timeout.
                             * Regardless, we should rotate the buffers
                             * if there's any new data pending and
                             * return it.
                             */
                            ROTATE_BUFFERS(d);
                            break;
                    }
                    lck_mtx_unlock(bpf_mlock);
                    return (error);
            }

In FreeBSD, the equivalent code is

            error = msleep(d, &d->bd_lock, PRINET|PCATCH,
                 "bpf", d->bd_rtout);
            if (error == EINTR || error == ERESTART) {
                    BPFD_UNLOCK(d);
                    return (error);
            }

and in NetBSD is

            error = tsleep(d, PRINET|PCATCH, "bpf",
                            d->bd_rtout);
            if (error == EINTR || error == ERESTART) {
                    splx(s);
                    KERNEL_UNLOCK_ONE(NULL);
                    return (error);
            }

and in OpenBSD is

            if (d->bd_rtout == -1) {
                    /* User requested non-blocking I/O */
                    error = EWOULDBLOCK;
            } else {
                    if ((d->bd_rdStart + d->bd_rtout) < ticks) {
                            error = tsleep((caddr_t)d, PRINET|PCATCH, "bpf",
                                d->bd_rtout);
                    } else
                            error = EWOULDBLOCK;
            }
            if (error == EINTR || error == ERESTART) {
                    D_PUT(d);
                    splx(s);
                    return (error);  
            }

and in DragonFly BSD is

            error = tsleep(d, PCATCH, "bpf", d->bd_rtout);
            if (error == EINTR || error == ERESTART) {
                    lwkt_reltoken(&bpf_token);
                    return(error);
            }

which might not handle this case; I'll have to try it, but I think there would have been significant complaints if it were broken on *BSD.

infrastation commented 10 years ago

@normjacobs, could you post the following additional information for completeness?

OS version and patchlevel (or equivalent)
the tcpdump release that fails and if the bug reproduces on the current git master build of tcpdump

Thank you.

guyharris commented 10 years ago

See also the-tcpdump-group/tcpdump#155.

infrastation commented 4 years ago

Given lack of feedback in the past 6 years, is there any good reason to keep this open?

mcr commented 4 years ago

reopen if still an issue with recent version.

infrastation commented 4 years ago

@guyharris, is there anything useful to take from this report? Would it make sense to leave a note in a man page or source code comment?

guyharris commented 4 years ago

For what it's worth, I built the tip of the master branch of libpcap, and then built the tip of the master branch of tcpdump with that libpcap, and tested it with -v and -w, and it seemed to work on "SunOS 5.11", with a "version" of 11.3, which I guess means Solaris 11.3.

However, if I

change the SIGALRM handler not to print anything;
change the interval timer to 1 microsecond;

no packets get captured.

So there is a problem - and I'm not sure it's Solaris-only. It does not happen on macOS Catalina; the Catalina code is

    if (error == EINTR || error == ERESTART) {
        if (d->bd_hbuf != NULL) {
            /*
             * Because we msleep, the hold buffer might
             * be filled when we wake up.  Avoid rotating
             * in this case.
             */
            break;
        }
        if (d->bd_slen != 0) {
            /*
             * Sometimes we may be interrupted often and
             * the sleep above will not timeout.
             * Regardless, we should rotate the buffers
             * if there's any new data pending and
             * return it.
             */
            ROTATE_BUFFERS(d);
            break;
        }
        bpf_release_d(d);
        lck_mtx_unlock(bpf_mlock);
        if (error == ERESTART) {
            printf("%s: %llx ERESTART to EINTR\n",
                __func__, (uint64_t)VM_KERNEL_ADDRPERM(d));
            error = EINTR;
        }
        return error;
    }

and the "Sometimes we may be interrupted often and..." code is probably what makes it work - if there's data in the store buffer, and there's no data in the hold buffer, if a signal happened during the read(), the buffers get rotated, so the store buffer becomes the hold buffer, and the next read will return that data immediately, so the data doesn't get stuck in the store buffer until the store buffer fills up.

On FreeBSD 12, however, no packets get captured in that case; FreeBSD does not have the "rotate the buffers if the store buffer is non-empty and the hold buffer is empty" code that macOS has. (If I were still working there, I could see whether this is the scenario that caused Apple to add that code.)

So there is an issue here, at least with systems using BPF and not having that code.

infrastation commented 4 years ago

So there is an old small problem. A solution does not have to be implemented immediately. I would be OK if this discussion boiled down to a comment with detailed directions in the source code for any future volunteers.

Would making the periodic update a separate POSIX thread be an OK solution? The periodic update code would be conditional, so if something like HAVE_WORKING_PTHREADS is not defined, the optional feature will not compile and will not get in the way of essential code flow.

Would the following text in the man page be OK to document the problem for the time being?

          When writing to a file with the -w option and at the  same  time
          not  reading  from  a  file with the -r option, report, once per
          second, the number of packets captured.  On Solaris and possibly
          other platforms this periodic update currently can cause loss of
          packets on their way from the kernel BPF to tcpdump.

mcr commented 4 years ago

So there is an old small problem. A solution does not have to be implemented immediately. I would be OK if this discussion boiled down to a comment with detailed directions in the source code for any future volunteers.

Would making the periodic update a separate POSIX thread be an OK solution? The periodic update code would be conditional, so if something like HAVE_WORKING_PTHREADS is not defined, the optional feature will not compile and will not get in the way of essential code flow.

I'm okay with doing this in tcpdump. I hate having a library that uses threads, but I don't think we need to do that.

Would the following text in the man page be OK to document the problem for the time being?

so, we'd document the issue. I would go further and suggest that it not be used on Solaris.

the-tcpdump-group / libpcap

tcpdump -v -w foo.out fails to capture on Solaris #1258