msantos / epcap

Erlang packet capture interface using pcap
http://listincomprehension.com/2009/12/erlang-packet-sniffer-using-ei-and.html
BSD 3-Clause "New" or "Revised" License
178 stars 56 forks source link

Timeout 0 (default) can block on Linux #29

Closed erlbeck closed 3 years ago

erlbeck commented 3 years ago

While picking up a project that worked well 2 years ago, I now get timeout issues when trying to use it with Linux 5.4.0 and libpcap 1.9.1 on Ubuntu 20.04.

The epcap test suite also fails with timeouts for most test cases.

The (default) {timeout, 0} on Linux conflicts with the description in pcap_set_timeout(3) which explicitly states that the behaviour for to_ms <= 0 is undefined, while pcap(3) tells that using "zero value ... on platforms that support a packet buffer timeout, will cause a read to wait forever ... with no timeout".

I have added #28 which solves that issue by providing an option to enable the immediate mode and by using that in the test cases. It was my intention to not change the default setting even if it leads to undefined behaviour on some platforms.

msantos commented 3 years ago

@erlbeck thank you for catching this! https://github.com/msantos/epcap/pull/28 looks great.

pcap_set_immediate_mode(3PCAP) says:

On Linux, with previous releases of libpcap, capture devices are always in immediate mode; however, in > 1.5.0 and later, they are, by default, not in immediate mode, so if pcap_set_immediate_mode() is available, it should be used.

On other platforms, capture devices are always in immediate mode.

From pcap_set_timeout(3PCAP):

We recommend always setting the timeout to a non-zero value unless immediate mode is set, in which case the timeout has no effect.

To summarize:

So it seems the default behaviour of linux changed post-1.5.0 and immediate mode should always be enabled. For ecap:

What do you think?

erlbeck commented 3 years ago

Thank you for pointing out, that the immediate mode is active by default on other platforms. This will cause #28 to change the default behaviour there, so I'd like to fix that.

I think it's a bit unfortunate that we have to different options that are not orthogonal at all.

Having to use [{immediate, false}, {timeout, 42}] to make sure the timeout is actually used will probably call for trouble, since [{timeout, 42}] would look fine at first sight. At least this would be platform independent.

In principle I see two (incompatible) approaches

  1. Being close to the libpcap API
  2. Being consistent with Erlang expectations

ad 1) Being close to the libpcap API, similar like it is with the current PR (except for the default behaviour). That would also mean to only call the pcapset* functions if the corresponding option is given by the user and to extend the -I CLI option to take an argument. This way the user gets the same experience like with using the local libpcap function directly.

ad 2) Merge this into a single option (e.g. 'timeout') with a proper platform independent definition of values which extends the specification given in pcap_set_timeout, like having additional values 'infinite' and 'immediate' using the latter as default. The value 0 for timeouts poses a problem here: In Erlang I'd expect this to indicate a 0ms timeout, like everywhere else in OTP. libpcap users on the other hand might rather expect it to refer to 'infinite', given that they have read the spec thoroughly.

As Erlang programmer I'd like to have a platform independent behaviour even in case I won't provide any of the above options. And I'd like to have value ranges that are consistent within the Erlang world. So I'd go with variant 2), where 0 (and perhaps additionally 'immediate') would set immediate mode (the other values would unset it) and 'infinity' would be mapped to MAXINT.

The issue with this might be that some users have relied on timeout 0 behaving like described in pcap(3).

erlbeck commented 3 years ago

@msantos I have updated #28 according to your proposal above, so the test cases will not need the 'immediate' option and non-Linux applications using epcap will not suffer from regression.

msantos commented 3 years ago

@erlbeck thanks, I merged your patch. I agree, option 2 sounds good:

{timeout, immediate | infinity | int32_t()}

If you feel like providing a patch for this, please do! Otherwise, thanks so much for working on this, really appreciate it and I'll look at pushing a commit with the proposed {timeout,immediate} in the next few days.

erlbeck commented 3 years ago

@msantos Thanks for merging the patch.

I can do the timeout patch in the next few days.