msantos / procket

Erlang interface to low level socket operations
http://blog.listincomprehension.com/search/label/procket
BSD 3-Clause "New" or "Revised" License
283 stars 80 forks source link

No messages received via port after closing and opening the fd #44

Closed jonnystorm closed 6 years ago

jonnystorm commented 6 years ago

I'm actually using procket with Elixir, so I apologize for the way atoms are represented here.

The problem I observe is this: when I open an fd with procket, then close that fd and open a new one, any port I open against the new fd will send me a single message and no more. Meanwhile, the socket receive queue fills up until I begin calling procket.recv(fd, 2048). Nothing in ss or lsof output suggests that the resulting socket or fd remain open after issuing procket.close(fd).

iex> {:ok, fd} = :procket.open 0, [progname: '/usr/local/bin/procket', interface: erl_if, progname: progname, family: :packet, type: :raw, protocol: 0x0300]
{:ok, 14}
iex> port = :erlang.open_port({:fd, fd, fd}, [:binary])
#Port<0.12>
iex> flush()
{#Port<0.12>,
 {:data,
  <<255, 255, 255, 255, 255, 255, 6, 184, 31, 86, 188, 243, 115, 115, 18, 17, 0,
    0, 0, 64, 154, 111, 45, 57, 26, 166, 181, 101, 95, 78, 244, 62, 193, 207,
    90, 120, 3, 73, 14, 42, 36, 33, 31, 182, 105, 211, ...>>}}
{#Port<0.12>,
 {:data,
  <<255, 255, 255, 255, 255, 255, 6, 184, 31, 86, 188, 243, 115, 115, 18, 17, 0,
    0, 0, 64, 154, 111, 45, 57, 26, 166, 181, 101, 95, 78, 244, 62, 193, 207,
    90, 120, 3, 73, 14, 42, 36, 33, 31, 182, 105, 211, ...>>}}
{#Port<0.12>,
 {:data,
  <<255, 255, 255, 255, 255, 255, 6, 184, 31, 86, 188, 243, 115, 115, 18, 17, 0,
    0, 0, 64, 154, 111, 45, 57, 26, 166, 181, 101, 95, 78, 244, 62, 193, 207,
    90, 120, 3, 73, 14, 42, 36, 33, 31, 182, 105, 211, ...>>}}
...
iex> :erlang.port_close port
true
iex> :procket.close fd
:ok
iex> {:ok, fd} = :procket.open 0, [progname: '/usr/local/bin/procket', interface: erl_if, progname: progname, family: :packet, type: :raw, protocol: 0x0300]
{:ok, 14}
iex> port = :erlang.open_port({:fd, fd, fd}, [:binary])
#Port<0.16>
iex> flush()
{#Port<0.16>,
 {:data,
  <<255, 255, 255, 255, 255, 255, 6, 184, 31, 86, 188, 243, 115, 115, 18, 17, 0,
    0, 0, 64, 36, 146, 11, 157, 23, 73, 225, 148, 224, 74, 222, 186, 120, 211,
    61, 213, 43, 78, 252, 173, 177, 139, 246, 239, 174, 51, ...>>}}
:ok
iex> flush()
:ok
iex> :procket.recv(fd, 2048)
{:ok,
 <<255, 255, 255, 255, 255, 255, 6, 184, 31, 86, 188, 243, 115, 115, 18, 17, 0,
   0, 0, 64, 77, 180, 29, 141, 181, 75, 97, 69, 77, 102, 85, 211, 49, 67, 130,
   18, 236, 90, 154, 117, 95, 60, 253, 169, 184, 45, 77, 195, ...>>}

Meanwhile, the socket receive queue fills up:

[16:29:49] jstorm[0]@alarbus:~$ ss -0 -lnp
Netid   State     Recv-Q    Send-Q        Local Address:Port          Peer Address:Port

p_raw   UNCONN    194320    0                         *:*                         *
 users:(("beam.smp",pid=22228,fd=14))
p_raw   UNCONN    0         0                         *:wlp1s0                    *

p_raw   UNCONN    0         0                         *:wlp1s0                    *

p_dgr   UNCONN    0         0                   [35085]:wlp1s0                    *

p_dgr   UNCONN    0         0                   [34958]:wlp1s0                    *

p_dgr   UNCONN    0         0                       [0]:*                         *

To restore normal behavior, I have to close and reopen my interactive Elixir shell session before I can receive messages from a new port.

If this isn't a problem in procket, I apologize for the frivolous issue, and I hope you might provide some suggestions on how to determine the cause.

Thank you for procket!

msantos commented 6 years ago

On Tue, Aug 21, 2018 at 02:52:55PM -0700, Jonathan Storm wrote:

I'm actually using procket with Elixir, so I apologize for the way atoms are represented here.

No problem, I like Elixir too!

The problem I observe is this: when I open an fd with procket, then close that fd and open a new one, any port I open against the new fd will send me a single message and no more. Meanwhile, the socket receive queue fills up until I begin calling procket.recv(fd, 2048). Nothing in ss or lsof output suggests that the resulting socket or fd remain open after issuing procket.close(fd).

iex> {:ok, fd} = :procket.open 0, [progname: '/usr/local/bin/procket', interface: erl_if, progname: progname, family: :packet, type: :raw, protocol: 0x0300]
{:ok, 14}
iex> port = :erlang.open_port({:fd, fd, fd}, [:binary])
#Port<0.12>
iex> flush()
{#Port<0.12>,
 {:data,
  <<255, 255, 255, 255, 255, 255, 6, 184, 31, 86, 188, 243, 115, 115, 18, 17, 0,
    0, 0, 64, 154, 111, 45, 57, 26, 166, 181, 101, 95, 78, 244, 62, 193, 207,
    90, 120, 3, 73, 14, 42, 36, 33, 31, 182, 105, 211, ...>>}}
{#Port<0.12>,
 {:data,
  <<255, 255, 255, 255, 255, 255, 6, 184, 31, 86, 188, 243, 115, 115, 18, 17, 0,
    0, 0, 64, 154, 111, 45, 57, 26, 166, 181, 101, 95, 78, 244, 62, 193, 207,
    90, 120, 3, 73, 14, 42, 36, 33, 31, 182, 105, 211, ...>>}}
{#Port<0.12>,
 {:data,
  <<255, 255, 255, 255, 255, 255, 6, 184, 31, 86, 188, 243, 115, 115, 18, 17, 0,
    0, 0, 64, 154, 111, 45, 57, 26, 166, 181, 101, 95, 78, 244, 62, 193, 207,
    90, 120, 3, 73, 14, 42, 36, 33, 31, 182, 105, 211, ...>>}}
...
iex> :erlang.port_close port
true
iex> :procket.close fd
:ok
iex> {:ok, fd} = :procket.open 0, [progname: '/usr/local/bin/procket', interface: erl_if, progname: progname, family: :packet, type: :raw, protocol: 0x0300]
{:ok, 14}
iex> port = :erlang.open_port({:fd, fd, fd}, [:binary])
#Port<0.16>
iex> flush()
{#Port<0.16>,
 {:data,
  <<255, 255, 255, 255, 255, 255, 6, 184, 31, 86, 188, 243, 115, 115, 18, 17, 0,
    0, 0, 64, 36, 146, 11, 157, 23, 73, 225, 148, 224, 74, 222, 186, 120, 211,
    61, 213, 43, 78, 252, 173, 177, 139, 246, 239, 174, 51, ...>>}}
:ok
iex> flush()
:ok
iex> :procket.recv(fd, 2048)
{:ok,
 <<255, 255, 255, 255, 255, 255, 6, 184, 31, 86, 188, 243, 115, 115, 18, 17, 0,
   0, 0, 64, 77, 180, 29, 141, 181, 75, 97, 69, 77, 102, 85, 211, 49, 67, 130,
   18, 236, 90, 154, 117, 95, 60, 253, 169, 184, 45, 77, 195, ...>>}

Meanwhile, the socket receive queue fills up:

[16:29:49] jstorm[0]@alarbus:~$ ss -0 -lnp
Netid   State     Recv-Q    Send-Q        Local Address:Port          Peer Address:Port

p_raw   UNCONN    194320    0                         *:*                         *
 users:(("beam.smp",pid=22228,fd=14))
p_raw   UNCONN    0         0                         *:wlp1s0                    *

p_raw   UNCONN    0         0                         *:wlp1s0                    *

p_dgr   UNCONN    0         0                   [35085]:wlp1s0                    *

p_dgr   UNCONN    0         0                   [34958]:wlp1s0                    *

p_dgr   UNCONN    0         0                       [0]:*                         *

To restore normal behavior, I have to close and reopen my interactive Elixir shell session before I can receive messages from a new port.

If this isn't a problem in procket, I apologize for the frivolous issue, and I hope you might provide some suggestions on how to determine the cause.

Thank you for the detailed bug report!

I couldn't reproduce this behaviour but I was testing on erlang 20. I'll try testing on erlang 21 later.

If you're using erlang 21, one possibility is that this is a new erlang bug.

In both cases you were given the same file descriptor number. Maybe erlang is not properly removing the fd from the pollset.

Some things to test:

{:ok, fd} = :procket.open 0, [progname: '/usr/local/bin/procket', interface: erl_if, progname: progname, family: :packet, type: :raw, protocol: 0x0300]
:procket.close fd
port = :erlang.open_port({:fd, fd, fd}, [:binary])
{:ok, fd} = :procket.open 0, [progname: '/usr/local/bin/procket', interface: erl_if, progname: progname, family: :packet, type: :raw, protocol: 0x0300]
# {:ok, 14}

port = :erlang.open_port({:fd, fd, fd}, [:binary])
:procket.close fd

{:ok, fd1} = :procket.open 0, [progname: '/usr/local/bin/procket', interface: erl_if, progname: progname, family: :packet, type: :raw, protocol: 0x0300]
# {:ok, 14}

{:ok, fd} = :procket.open 0, [progname: '/usr/local/bin/procket', interface: erl_if, progname: progname, family: :packet, type: :raw, protocol: 0x0300]
# {:ok, 15}

:procket.close fd1

port = :erlang.open_port({:fd, fd, fd}, [:binary])
jonnystorm commented 6 years ago

Thanks for the suggestions! I did fail to mention that the order in which I close the fd and port does not appear to matter, and that the problem does, indeed, follow the fd, as I can create a second fd (15, say) that works without problems.

Sorry for the missing version information! I know better than that. :)

[14:16:17] jstorm[0]@alarbus:genraw-elixir$ iex -S mix
Erlang/OTP 21 [erts-10.0.3] [source] [64-bit] [smp:6:6] [ds:6:6:10] [async-threads:1]

Interactive Elixir (1.7.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)>

A bug in the VM would certainly make sense; there clearly has to be some leftover state there. Perhaps more careful scrutiny of observer data could provide some insight. If nothing else, I'll just test under OTP 20 and try compiling master for comparison.

I'll let you know how it goes.

msantos commented 6 years ago

Found the bug! https://bugs.erlang.org/browse/ERL-692

Fixed by this commit: https://github.com/erlang/otp/pull/1918/commits/1d1f29eb614ce75012f9136d185f4dbcccc8650b

jonnystorm commented 6 years ago

Outstanding! Also, I just confirmed this is fixed in OTP master:

[16:55:21] jstorm[0]@alarbus:genraw-elixir$ iex -S mix
Erlang/OTP 22 [DEVELOPMENT] [erts-10.0.5] [source-9a95f6f1cd] [64-bit] [smp:6:6] [ds:6:6:10] [async-threads:1]

Interactive Elixir (1.7.2) - press Ctrl+C to exit (type h() ENTER for help)

Thanks for your help!