Support for nanosecond timestamps

bmerry commented 5 years ago

To address #76. It still needs unit tests and documentation, but opening for discussion (and anyone who comes across that issue and wants to test it). I've not used Cython much, so feel free to suggest better ways to get the job done.

To get the nanosecond timestamps, you need to do two things:

Pass precision=pcap.PCAP_TSTAMP_PRECISION_NANO, which is then forwarded to pcap itself. Question: should we just make this the default where supported? Unlike in the underlying pcap library, this shouldn't cause any backwards compatibility issues.
Pass timestamps_in_ns=True to get timestamps as integer nanosecond counts instead of floats. Python float doesn't have enough precision to present current times since the epoch with nanosecond precision (it's more like hundreds of nanoseconds). 64-bit signed int is good for a few hundreds years. Could be upgraded to 64-bit unsigned, but I'm not sure if there are any performance issues moving those between Cython and Python.

guyharris commented 5 years ago

This adds support for reading nanosecond-resolution pcap files, and pcapng files with higher-than-microsecond resolution, and getting seconds/nanoseconds time stamp values.

Given that, at least in the master branch, pypcap also supports pcap_create() and pcap_activate(), it could, with versions of libpcap that have pcap_set_tstamp_precision(), support live captures with higher-than-microsecond resolution if the underlying capture mechanism supports it; if it doesn't support it, pcap_set_tstamp_precision() reports an error.

bmerry commented 5 years ago

@guyharris I'm not entirely clear what your second paragraph is saying: are you just describing what I've already implemented, or suggesting a change?

guyharris commented 5 years ago

@guyharris I'm entirely clear what your second paragraph is saying: are you just describing what I've already implemented, or suggesting a change?

Sorry, I didn't notice that this change also added support for nanosecond-resolution live capture.

bmerry commented 5 years ago

Any thoughts on whether nanosecond support should just be enabled by default with a fallback to microsecond if not available? I don't know if there are any downsides (e.g. performance implications) in nanosecond live capture.

guyharris commented 5 years ago

I don't know if there are any downsides (e.g. performance implications) in nanosecond live capture.

I suspect the performance implications are minimal.

I think the compatibility implications could be significant - software that always expects to get seconds-and-microseconds will be confused if it gets seconds-and-nanoseconds; that (plus the fact that not all capture mechanisms support nanosecond resolution) is why libpcap requires that a program opt into nanosecond-resolution time stamps.

bmerry commented 5 years ago

I've modified the PR so that it will ask pcap for nanosecond precision automatically, but the user still needs to pass timestamps_in_ns to get timestamps in units of nanoseconds. That should be backwards compatible because pypcap converts the C structure into a real number of seconds.

hellais commented 5 years ago

Thanks for putting this together!

This seems reasonable, given that full backward compatibility is preserved.

There are also unittests that validate that in the default mode the timestamp is returned in microseconds (as it used to be).

@guyharris do you think there could be any other issue to be aware of from doing the conversion from nano-seconds to microseconds in our library code?

@bmerry is there a reason to enable nanosecond precision automatically and do the scaling in our library vs just not enabling nanosecond precision unless timestamp_in_ns=True?

bmerry commented 5 years ago

Nanoseconds in float still gives about a 5x improvement in resolution when using floats, and I don't think it adds much complexity.

guyharris commented 5 years ago

@guyharris do you think there could be any other issue to be aware of from doing the conversion from nano-seconds to microseconds in our library code?

The pypcap code currently converts seconds-since-the-Epoch/microseconds into a floating-point seconds-and-fractions-of-a-second-since-the-Epoch; that's a value that would have the same type regardless of whether the file you're reading, or the capture you're doing, is giving seconds/microseconds or seconds/nanoseconds time stamp; the only difference is that the fractional part of the floating-point number would have higher precision.

This change just:

by default, converts seconds-since-the-Epoch/nanoseconds into the same type of floating-point number;
if ctx.timestamp_in_ns is set to true, converts both seconds-since-the-Epoch/microseconds and seconds-since-the-Epoch/nanoseconds into a nanoseconds-since-the-Epoch value.

Neither of those convert nanoseconds to microseconds - the second of those does convert microseconds to nanoseconds (hdr.ts.tv_usec * ctx.scale_ns).

The only issue I can see is "are there enough bits"?

Given that we don't want to get bitten by a Y2.038K bug, we definitely want at least 2^32 bits worth of seconds; given nanosecond resolution, we need at least 2^30 bits of nanoseconds, so:

for an integral number of nanoseconds, that's at least a 62-bit integer;
for a floating-point number of seconds or nanoseconds, that's at least a 62-bit mantissa.

At least as I read the Python documentation, that means a Python 2 "long integer" (i.e., a bignum) or a Python 3 integer (they're all bignums, right?) for an integral number of nanoseconds.

And, for a floating-point value, it means Python needs to be using something equivalent to C's long double to store floating-point numbers, and that needs to have at least a 62-bit mantissa. Unless I've miscalculated, a 64-bit mantissa (which is what an x86 "double extended precision" float gives you) gives you 584 years worth of nanoseconds, giving us a couple of centuries to figure out how to avoid Y2.262K problems ("584 years" means "292 years before 1970 and 292 years after 1970"). Unfortunately, if Python's real numbers don't give you that....

bmerry commented 5 years ago

I think the switchover point from int to long in Python 2 might be platform specific, but on my machine it seems to be at 2^63, which as you say gives us until 2262 (and even then, we don't need to break the interface; we just need to change the implementation to construct long integers). No doubt by then we'll need picosecond timestamps (we've already got networks where packets can take less than a nanosecond).

I think a 64-bit mantissa actually gives twice the range because it's separate from the sign bit, so would be good to about 2554. However, Python has no builtin support for x86 80-bit floats, so that's not a realistic option.

guyharris commented 5 years ago

I think a 64-bit mantissa actually gives twice the range because it's separate from the sign bit, so would be good to about 2554. However, Python has no builtin support for x86 80-bit floats, so that's not a realistic option.

I.e., Python doesn't use long double anywhere? If so, that's going to be a problem with time stamps represented as floating-point values in units of seconds if the time stamp has nanosecond precision.

guyharris commented 5 years ago

No doubt by then we'll need picosecond timestamps (we've already got networks where packets can take less than a nanosecond).

...which means that pcapng format may have to change as well, as its time stamps are 64-bit integers in units of either 2^-n or 10^-n seconds; that will work for a few centuries with nanosecond time stamps but doesn't work with picosecond time stamps.

bmerry commented 5 years ago

I.e., Python doesn't use long double anywhere? If so, that's going to be a problem with time stamps represented as floating-point values in units of seconds if the time stamp has nanosecond precision.

Which is exactly why I introduced the integer-nanoseconds representation in this PR.

bmerry commented 5 years ago

Happy New Year! Is there anything specific you'd like me to change to get this merged?

hellais commented 5 years ago

@guyharris @bmerry thanks for thinking about some potential issues with this and Y2.038K bugs!

My understanding is that since we are using 64 bit ints we should be good to go.

I am going to regenerate pcap.c and merge this into master. Thanks for working on this @bmerry !

pynetwork / pypcap

Support for nanosecond timestamps #83