kbandla / dpkt

fast, simple packet creation / parsing, with definitions for the basic TCP/IP protocols
Other
1.08k stars 270 forks source link

Use int precision multiplier instead of float #639

Closed andrew-bolin closed 1 year ago

andrew-bolin commented 1 year ago

This code change is very straightforward (1e6 -> 1_000_000), but I'll explain the backstory.

I have a software utility that writes to either pcap or pcapng - I'm sure many people do, the API is the same which makes it very easy :D

Except! We have nanosecond resolution timestamps. So we're using nano=True for pcap (not supported in pcapng, that's fine).

It turns out that floats do not have sufficient precision to encode a UNIX timestamp with small nanosecond differences, so we use Decimal for our timestamp. This is all well and good for pcap files (which multiply the timestamp by an integer), but no good for pcap, which multiplies the timestamp by 1e6 (which is a float).

So, in my project, I could do something like if pcapng, convert timestamp to float.

But it's a lot cleaner if dpkt could replace the constant 1e6 with 1_000_000. Then I can simply pass a Decimal into either Writer type :sunglasses:

I also converted it from being a magic number in 4 places to a variable, this might help if someone in the future feels like expanding the pcapng Writer to support nanoseconds (if the file format permits?)... :wink:

obormot commented 1 year ago

Build fails due to Python 2.7 not understanding the 1_000_000 convention. dpkt will drop python2 support any day now but for now replacing 1_000_000 with 1000000 will help the build pass

coveralls commented 1 year ago

Coverage Status

Coverage increased (+4.0e-05%) to 99.817% when pulling db84eee2b233f807053259be849ec55f83016062 on andrew-bolin:master into 40f7b5665d173a886fa1d44e30e8d4547d561816 on kbandla:master.

obormot commented 1 year ago

This PR changes the Writer class adding self._precision_multiplier = 1000000, replacing the hardcoded value of 1e6; however the existing Reader class already has self._divisor = float(1e6) so at a minimum this change would introduce inconsistency between the Reader and the Writer, which we should avoid.

obormot commented 1 year ago

Also https://github.com/kbandla/dpkt/issues/462 could be related here?

andrew-bolin commented 1 year ago

Fair point!

My change makes the two Writers more consistent, but I didn't think to look at the Reader.

Next time I'm in front of a computer I'll swap an integer constant into the Reader as well.

Regarding issue #462 - indirectly related, yes. If one were to return Decimal timestamps as suggested there, it would reasonably be expected to accept them as inputs also, which is the goal of this change 🙂

On Fri, 16 Sep 2022, 15:39 Oscar, @.***> wrote:

This PR changes the Writer class adding self._precision_multiplier = 1000000, replacing the hardcoded value of 1e6; however the existing Reader class already has self._divisor = float(1e6) so at a minimum this change would introduce inconsistency between the Reader and the Writer, which we should avoid.

— Reply to this email directly, view it on GitHub https://github.com/kbandla/dpkt/pull/639#issuecomment-1248939256, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3DXRJJLRHIZHE3TTTWANLV6QBZVANCNFSM6AAAAAAQN6ND6Q . You are receiving this because you authored the thread.Message ID: @.***>

andrew-bolin commented 1 year ago

oh, Python2 treats int / int as an integer division, unlike Python3 :cry:

how far away is that plan to drop 2.x ?

andrew-bolin commented 1 year ago

@obormot - anything else I can do to help this proceed?

obormot commented 1 year ago

@obormot - anything else I can do to help this proceed?

TBH I'd like to spend some more time reviewing this change, not code-wise but conceptually. I'd like to understand (1) if this could potentially be a breaking change, (2) what's the deal with timestamp precision in pcaps, should they be ints or floats, or Decimal as some other PRs suggested -- essentially what is the "right way" to do it.

obormot commented 1 year ago

oh, Python2 treats int / int as an integer division, unlike Python3 😢

Ah, this must be the reason we used floats in the original code (written for py2 at the time). I've reviewed this change once more and it looks good. Merging into master for now, let's monitor and see if anyone complains :) In general though there are discrepancies between pcap and pcapng handling (lack of nano=True option in pcapng, use of Decimal and 1E6 in pcap) which we should address.