Open guyharris opened 5 years ago
He did use a larger buffer - 16KiB. That was probably on Linux; the default st_blksize
on Linux appears to be the page size, so it was probably 4KiB. If a 16KiB fwrite()
bypasses the buffer and just writes directly to the file, as a result of being a multiple of the buffer size, then that'd be a 4x reduction in the number of write()
calls and thus a 4x reduction in the number of system calls, which might make a significant difference.
A bigger buffer would result in an even greater reduction, although I don't know whether a buffer that's close to or bigger than the L1 data cache (32K on most of Intel's x86-64 processors) would be a problem.
As for an automatically-allocated buffer:
setvbuf()
documentation, which presumably is describing the glibc version, "If the argument buf
is NULL
, only the mode is affected; a new buffer will be allocated on the next read or write operation.". The GNU libc documentation for setvbuf()
, however, just says "If you specify a null pointer as the buf argument, then setvbuf
allocates a buffer itself using malloc
. This buffer will be freed when you close the stream. ... Otherwise, buf should be a character array that can hold at least size characters.", which doesn't explicitly indicate that the buffer allocated by setvbuf()
will have size size.setvbuf()
has been extended to support setting the buffer size without explicitly setting the buffer.setvbuf()
source indicates that it - and probably the rest of the standard I/O library - is BSD-derived, so it probably implements the same "NULL argument plus non-zero size plus not unbuffered means a buffer of the specified size will be automatically allocated" behavior, and that's what the code appears to do.setvbuf()
documentation says "If the size
argument is not zero but buf
is NULL
, a buffer of the given size will be allocated immediately, and released on close.".setvbuf()
documentation say the same thing (not surprising, given that they probably all started with the 4.4-Lite libc, and that behavior probably dates back a long time for all those OSes).setvbuf()
documentation says "The array pointed to by buffer is used as the buffer, unless it is NULL, in which case setvbuf uses an automatically allocated buffer of length size/2 * 2 bytes.".However:
setvbuf()
documentation just says "If buf
is not the null pointer, the array it points to will be used for buffering, instead of an automatically allocated buffer. The size
argument specifies the size of the buffer to be used. If input/output is unbuffered, buf
and size
are ignored.", which could be read as suggesting that if buf
is the null pointer, and the stream is buffered, it automatically allocates a buffer of size
bytes, but they don't unequivocally state that.setvbuf()
documentation says "If the Buffer parameter is not a null character pointer, the array it points to is used for buffering. The Size parameter specifies the size of the array to be used as a buffer, but all of the Size parameter’s bytes are not necessarily used for the buffer area.", which could again be read as suggesting that a null Buffer argument, non-zero Size argument, and Mode argument other than _IONBF
means that a buffer of the specified size will automatically be allocated, but they don't say it explicitly;setvbuf()
documentation says "If buf is not the NULL pointer, the array it points to is used for buffering instead of an automatically allocated buffer (from malloc()). size specifies the size of the buffer to be used. size specifies the size of the buffer to be used.", which again suggests that you can set the size of an automatically-allocated buffer but doesn't explicitly state it;setvbuf()
documentation says "If the buffer
parameter is not a null pointer, the array that the parameter points to is used for buffering instead of a buffer that is automatically allocated. The size
parameter specifies the size of the buffer to be used." - you know the drill;setvbuf()
documentation says "If buf is the NULL pointer, or size is less than eight, space will be allocated to accommodate a buffer. This buffer will be of length BUFSIZ.", which means that 1) the good news is that a NULL buf means it'll automatically allocate a buffer but 2) the bad news is that, no matter what the value of size is, the buffer will be BUFSIZ bytes.The System V Release 2 setvbuf()
documentation says "If buf is not the NULL pointer, the array it points to will be used for buffering, instead of an automatically allocated buffer. Size specifies the size of the buffer to be used.", so that appears to be where the second group of UN*Xes - all SV-based, with a ton of BSD and vendor stuff added on top (Solaris being based on SVR4 which was a huge merge of SunOS 4.x into the SV code base, giving not only BSD stuff but Sun stuff) - got their man page text, so I guess some tomatoes should be thrown at AT&T for not making it clearer.
So, just for the lulz, let's see if we can find the SV source.
Here on archive.org there's a page that claims to have SV source from several releases; the tarball for SVR2 for NS32xxx (SysVr2.0-32000) has a version of setvbuf()
with what appears to be the AT&T copyright notice from my days at Sun, and it...
...will, if the mode argument isn't _IONBF
:
BUFSIZ
;BUFSIZ
, or a buffer hasn't already been allocated, allocate a buffer of the specified size plus 8 bytes.So I rather suspect that the `setvbuf() in any UN*X with an SV-derived stdio library - which probably means "all the traditional commercial UN*Xes that still survive, plus Tru64 and even IRIX" - will, if called early enough (i.e., before you do anything that needs a buffer), and passed a null buffer pointer and a non-zero buffer size, allocate a buffer of that size.
Any UN*X with a 4.4 BSD/4.4-Lite BSD-derived stdio library - which means the *BSDs, macOS/iOS/tvOS/watchOS, and, apparently, Android - will do so as well.
Unless the Linux documentation is lying, GNU libc's stdio library will do that.
The Windows libc will do that as well.
And, from a look at the MUSL source, it ignores the size argument if the buf argument is null.
So my inclination is to do the setvbuf()
call; if it doesn't increase the buffer size, you lose, I guess, and should go yell at the supplier of your libc to make it work. At least the dumping APIs will still work, with the default buffer size, even if that means more system calls than you'd like.
(Given that a pcap_dumper_t *
is just a FILE *
, meaning we don't have some data structure that we control, we have no mechanism for getting at the buffer at close time to free it, so we have to have the automatically-allocated buffer.)
Something like this? --- a/sf-pcap.c +++ b/sf-pcap.c @@ -771,7 +771,11 @@ pcap_setup_dump(pcap_t p, int linktype, FILE f, const char *fname) else setvbuf(f, NULL, _IONBF, 0);
With GNU libc (glibc), setvbuf(fp, NULL, _IOFBF, size)
will ignore size
if buf
is NULL, see
libio/iosetvbuf.c:
int _IO_setvbuf(FILE* fp, char* buf, int mode, size_t size) {
int result;
CHECK_FILE(fp, EOF);
_IO_acquire_lock(fp);
switch (mode) {
case _IOFBF:
fp->_flags &= ~(_IO_LINE_BUF | _IO_UNBUFFERED);
if (buf == NULL) {
if (fp->_IO_buf_base == NULL) {
/* ... */
if (_IO_DOALLOCATE(fp) < 0) {
result = EOF;
goto unlock_return;
}
fp->_flags &= ~_IO_LINE_BUF;
}
result = 0;
goto unlock_return;
}
break;
// ...
}
result = _IO_SETBUF(fp, buf, size) == NULL ? EOF : 0;
unlock_return:
_IO_release_lock(fp);
return result;
}
If buf
is NULL, then a buffer is allocated via _IO_DOALLOCATE
. That is a macro that will probably end up calling _IO_file_doallocate
from libio/filedoalloc.c which uses BUFSIZE
or (if positive and smaller), st.st_blksize
. So this is probably not what you want, and an explicit buffer has to be given.
Yeah, the Linux man page isn't as explicit as I'd like, but "If the argument *buf*
is NULL
, only the mode is affected" does suggest that the size isn't affected.
If an explicit buffer is given, we don't have any good way to free it in pcap_dump_close()
, given that:
pcap_dumper_t *
is just another name for a FILE *
, so we don't have any place to save a pointer to the buffer;fflush()
on the file being written, and does so by doing the fflush()
on the pcap_dumper_t *
cast to a FILE *
if pcap_dump_flush()
isn't available, so it's risky to change what a pcap_dumper_t
is;So the choices we have are:
pcap_dumper_t *
that points to an opaque structure that includes the FILE *
and a void *
pointing to the buffer.I'm inclined to go for the second option.
Guy Harris notifications@github.com wrote:
1 a pcap_dumper_t is just another name for a FILE , so we don't have any place to save a pointer to the buffer; 2 there's code out there that needs to, for example, do an fflush() on the file being written, and does so by doing the fflush() on the pcap_dumper_t * cast to a FILE
I guess this was a bad thing. Such code would fail to recompile, but of course it's an ABI change.
> * if pcap_dump_flush() isn't available, so it's risky to change what a
> pcap_dumper_t is; 3 there's no generally-available API to get a pointer
> to the buffer.
:-(
> So the choices we have are:
> 1 don't do this on Linux;
> 2 develop new APIs for dumping, with a handle
> type different from pcap_dumper_t * that points to an opaque structure
> that includes the FILE * and a void * pointing to the buffer.
> I'm inclined to go for the second option.
Me too. We are going to need it for pcapng.
-- ] Never tell me the odds! | ipv6 mesh networks [ ] Michael Richardson, Sandelman Software Works | IoT architect [ ] mcr@sandelman.ca http://www.sandelman.ca/ | ruby on rails [
We are going to need it for pcapng.
Yes. I wouldn't want to require pcapng to get the bigger stdio buffer, and I'm not sure whether we should have a single new API for writing both pcap and pcapng files, but either way we should have support for writing pcap files, not just pcapng files, with a bigger stdio buffer.
pcap_dumper_t
is already an opaque structure (already since its initial introduction in 1999), the fact that it is type-casted from FILE *
is an internal implementation detail. Are there any issues with converting this to an actual structure?
pcap_dumper_t
is already an opaque structure (already since its initial introduction in 1999), the fact that it is type-casted fromFILE *
is an internal implementation detail. Are there any issues with converting this to an actual structure?
To quote an earlier comment:
there's code out there that needs to, for example, do an
fflush()
on the file being written, and does so by doing thefflush()
on thepcap_dumper_t *
cast to aFILE *
ifpcap_dump_flush()
isn't available, so it's risky to change what apcap_dumper_t
is;
I'm unclear if there is still work to do here.
I think not, and the lack of activity during 5 months backs me up. Can it be closed?
We are going to do the setvbuf() in case it helps. Ticket remains open for now.
A blog post from Napatech says:
That's not strictly true; what
fwrite()
does is, in essence:so each call to
fwrite()
does not result in a single system call ("write buffer" is assumed to turn into a system call -write()
on UN*X,_write()
->WriteFile()
->NtWriteFile()
on modern Windows).However, he may have used a larger buffer in the buffering wrapped around
fwrite()
than the standard I/O library was using. On Linux with glibc, and on BSD-flavored UN*Xes including macOS, the buffer size for a regular file will be thest_blksize
value supplied by the file system for anfstat()
(unless the file system fails to supply it and it's zero, in which case it's 8KiB with glibc and 1KiB on BSD-flavored UN*Xes). Other UN*Xes may also usest_blksize
. On Windows, it's 4KiB.st_blksize
, in 4.2BSD, was the FFS block size of the file system, typically 8KiB. It appears to be 4MiB on APFS and 4KiB on HFS+ on High Sierra.So what we might want to do is:
The Single UNIX Specification doesn't guarantee that passing a non-zero buffer size and a null buffer pointer will cause
setvbuf()
to allocate a buffer of that size internally; the C99 spec says only thatemphasis on "may". Therefore, we allocate it manually.
This is a bit tricky as it would require us to free the buffer on close; the C99 spec says of
fclose()
thatwhich seems to indicate that such a buffer won't be freed automatically on close as it was manually allocated outside the standard I/O library.