Open meox opened 1 year ago
No, I haven't seen any reports of timeouts. It sounds as if a gen_server:call may be timing out.
Checking the code, sends should not timeout. For the send/2 code path:
active
mode, either:
{active, false}
: the data is written directly to the non-blocking file descriptor using write(2)
{active, true | integer()}
: the data is written to a port holding the fd using erlang:port_command/2
port_command/2
is badarg
Is it possible another gen_server:call is timing out?
The scheduler can suspend the caller to apply backpressure if using active
mode and a lot of data is being written to the port. This could cause a call upstream to timeout.
The busy check can be bypassed using the force
option to port_command/3
. Apparently erlang distribution drivers set this flag so it makes sense for tunctl to do so too.
diff --git a/src/tuncer.erl b/src/tuncer.erl
index 6bda60e..7dbe7b0 100644
--- a/src/tuncer.erl
+++ b/src/tuncer.erl
@@ -744,7 +744,7 @@ handle_call({send, Data}, _From, #state{port = false, fd = FD} = State) ->
{reply, Reply, State};
handle_call({send, Data}, _From, #state{port = Port} = State) ->
Reply =
- try erlang:port_command(Port, Data) of
+ try erlang:port_command(Port, Data, [force]) of
true ->
ok
catch
yes I'm using active mode!
I tried but I'm receiving not_sup:
yes I'm using active mode!
I tried but I'm receiving not_sup:
Thanks for testing. Sorry, there must be a problem with the tuncer tests: the tests passed for me with the change, I'll look into it.
Another option is bypassing both the gen_server and the port for writes: continue using active
mode for reads but switch to writing directly to the device fd:
FD = tuncer:getfd(Ref), % calls into the gen_server
tuncer:write(FD, Data).
just to be more precise: I'm using it from a common test
open_port/2
has an option to disable the busy limits. The docs lists the fd
driver as supported.
(Diff updated to include busy_limits_msgq)
diff --git a/src/tuncer.erl b/src/tuncer.erl
index 6bda60e..b44e13a 100644
--- a/src/tuncer.erl
+++ b/src/tuncer.erl
@@ -845,7 +845,7 @@ code_change(_OldVsn, State, _Extra) ->
%%% Internal functions
%%--------------------------------------------------------------------
set_mode(active, FD) ->
- open_port({fd, FD, FD}, [stream, binary]);
+ open_port({fd, FD, FD}, [stream, binary, {busy_limits_port, disabled}, {busy_limits_msgq, disabled}]);
set_mode(passive, Port) ->
try erlang:port_close(Port) of
true ->
port_command([force])
About the tuncer tests and notsup
, the tests are using {active,false}
only. I'll add some tests covering active
mode.
Note: with active
mode enabled, I still wasn't able to reproduce the notsup
but using {busy_limits_port, disabled}
looks like a better fix.
it's no clear for me if there is some case where it's not safe to disable those limits
it's no clear for me if there is some case where it's not safe to disable those limits
Good point. My theory is tuntap devices have the same issues as writing a distribution driver. There can be a sort of priority inversion where the application is waiting because the scheduler is applying backpressure to the process performing the I/O. Similar to distribution drivers, the busy limits should be disabled either using port_command([force])
or open_port({fd, FD, FD}, [stream, binary, {busy_limits_port, disabled}, {busy_limits_msgq, disabled}])
.
My guess is that the {error, timeout}
you see is caused by a, e.g., gen_server:call/2
somewhere before the call to send. The default timeout is 5 seconds which is signifcant latency for a network. Disabling the busy limits might reduce the latency. But note other processes in the chain, like the tuncer gen_server, may also be suspended.
Without code to reproduce or a stack trace, that's just speculation. So I'll plan to do the following:
active
mode{error, timeout}
is reproducibleopen_port
options to be passed in directly to create/2
good! thanks a lot for your support
I don't understand why I got sporadic {error, timeout}, is this a known issue?