knxd / knxd

GNU General Public License v2.0
527 stars 202 forks source link

No recycling of expired connection #222

Closed oblaise closed 7 years ago

oblaise commented 7 years ago

Dear All,

In tunnel mode, each time a client connect to knxd (CONNECTION_REQ) knxd tries to find a free knx address in its pool. Even when a client is disconnected properly (DISCONNECT_REQUEST) it doesn't free the closed connection meaning that after a few connection the pool is consumed and there is no more connection available. Normally the address should be freed when a client disconnect properly or when the connection timeout.

I was able to reproduce this bug using ETS in tunnel mode toward knxd.

knxd --version knxd 0.12.8:c9ea107

Kind regards, --ob

tru7 commented 7 years ago

I think this behaviour was reported occasionally since client address pool was introduced in 0.11. However I could never reproduce this in my environment where knxd talks via tunnel to an IP Interface neither when knxd tasks to another knxd over a tunnel. So I think it may depend on the interface for the bus. What kind of interface are you using to connect to the knx bus? Can you reproduce this by using knxtool groupswrite command instead of ETS?

smurfix commented 7 years ago

@tru7 groupswrite et al. uses a different mechanism of talking to knxd, so that's not conclusive.

A better way of reproducing this would be to run another knxd with ipt:, however, that way doesn't yet send a DISCONNECTREQUEST. (To be added in 0.14.)

I doubt that this would depend on the backend(s), they're different subsystems, but stranger bugs have happened.

oblaise commented 7 years ago

I'm also in an environment where knxd talks via tunnel to an IP interface. To reproduce this setting it's easy: use ETS -> (tunnel) -> knxd -> (tunnel) -> KNX IP interface and use group monitor in ETS using the knxd interface.

I'm starting knxd with: /usr/bin/knxd -t 0xffc -f 9 -e 0.0.1 -E 0.0.2:2 -u /tmp/eib -b ipt:<ip interface> -D -T -R -S

Then I'm starting a 'Group Monitor' window and start monitoring. After stopping and starting it again 2 times it fails with: knxd: W00000059: Allocate: no more free addresses! knxd: Layer 8 [ 5:mcast:knxd 56.200] Tunnel CONNECTION_REQ with 0.0.0 knxd: Layer 8 [ 5:mcast:knxd 56.200] Tunnel CONNECTION_REQ, ignored, no free addresses knxd: Layer 8 [ 5:mcast:knxd 56.200] CONNECTION_REQ: no free channel

smurfix commented 7 years ago

I cannot reproduce this with the current stable version. Please tell me what "knxd -V" tells you. If your version definitely is v0.12.12 (git revision 0cb873b… – please verify that git rev-parse --short HEAD shows this) then please post a full log of a successful "knxtool" execution. Mine looks like this, (knxtool groupswrite local: 1/1/9 0):

knxd: Layer 8 [ 2:/tmp/kt        90.021] New Connection
knxd: Layer 8 [ 2:/tmp/kt        90.021] ClientConnection Init
knxd: Layer 8 [ 3:layer3         90.021] unknown addr 9.9.11
knxd: Layer 3 [31:/tmp/kt        90.021] Allocate 9.9.11
knxd: Layer 7 [32:/tmp/kt:9.9.11 90.021] OpenGroup
knxd: Layer 4 [32:/tmp/kt:9.9.11 90.021] OpenGroup 1/1/9 WO
knxd: Layer 3 [ 3:layer3         90.022] registerLayer2 32:/tmp/kt:9.9.11
knxd: Layer 3 [ 3:layer3         90.022] registerLayer2 32:/tmp/kt:9.9.11 = 1
knxd: Layer 7 [32:/tmp/kt:9.9.11 90.022] OpenGroup complete
knxd: Layer 7 [31:/tmp/kt        90.022] recv Group(002): 00 80
knxd: Layer 4 [32:/tmp/kt:9.9.11 90.022] Recv Group T_DATA_XXX_REQ A_GroupValue_Write (small) 00 
knxd: Layer 9 [32:/tmp/kt:9.9.11 90.022] Queue L_Data low from 0.0.0 to 1/1/9 hops: 06 T_DATA_XXX_REQ A_GroupValue_Write (small) 00 
knxd: Layer 8 [32:/tmp/kt:9.9.11 90.022] found addr 9.9.11
knxd: Layer 3 [ 3:layer3         90.022] Route L_Data low from 9.9.11 to 1/1/9 hops: 05 T_DATA_XXX_REQ A_GroupValue_Write (small) 00 
knxd: Layer 2 [ 4:ipt:b2.raspi   90.022] Send L_Data low from 9.9.11 to 1/1/9 hops: 05 T_DATA_XXX_REQ A_GroupValue_Write (small) 00 
knxd: Layer 8 [31:/tmp/kt        90.022] ClientConnection 9.9.11 closing
knxd: Layer 3 [ 3:layer3         90.022] Release 9.9.11
knxd: Layer 7 [31:/tmp/kt        90.022] CloseGroup
knxd: Layer 3 [32:/tmp/kt:9.9.11 90.022] deregisterLayer2 32
knxd: Layer 3 [32:/tmp/kt:9.9.11 90.022] deregisterLayer2 32 OK
knxd: Layer 4 [32:/tmp/kt:9.9.11 90.022] CloseGroup
knxd: Layer 1 [ 4:ipt:b2.raspi   90.022] SendTunnel(015): 04 01 0D 00 11 00 BC D0 99 0B 09 09 01 00 80
knxd: Layer 1 [ 4:ipt:b2.raspi   90.022] Send(015): 04 01 0D 00 11 00 BC D0 99 0B 09 09 01 00 80
knxd: Layer 1 [ 4:ipt:b2.raspi   90.023] Send(004): 04 01 C5 00
knxd: Layer 1 [ 4:ipt:b2.raspi   90.252] Heartbeat
knxd: Layer 1 [ 4:ipt:b2.raspi   90.252] Send(010): 01 00 08 01 0A 6B 00 05 0E 58

Please don't forget quoting code … otherwise anything in <> is interpreted as HTML and won't be shown.

oblaise commented 7 years ago

Just recompiled with latest version (was 0.12.8, now its 0.12.12) -> issue remains. I can drop you the logs of doing a knxtool execution but I don't have any issue with knxtool execution. My logs is similar to yours and I can execute it as many time as I want.

Issue happens when using multiple connections using IP tunnels (from i.e. ETS). The KNX address used for the tunnel connection isn't released when the IP tunnel client send the 'DISCONNECT_REQUEST' request.

I dig into the code and I observed that ConnState::stop() is called when a tunnel client send a disconnect request (eibnetserver.cpp, line 584): (*i)->stop();

ConnState::stop is defined in eibnetserver.cpp, line 356:

void ConnState::stop()
{
  TRACEPRINTF (t, 8, this, "Stop Conn %d", channel);
  if (type == CT_BUSMONITOR)
    l3->deregisterVBusmonitor(this);
  timeout.stop();
  sendtimeout.stop();
  send_trigger.stop();
  retries = 0;
  parent->drop_connection (std::static_pointer_cast<ConnState>(shared_from_this()));
  Layer2::stop();
  if (remoteAddr && l3)
    {
      l3->release_client_addr(remoteAddr);
      remoteAddr = 0;
    }
}

When this function calls the Layer2::stop(); the l3 field is reset and thus the last 4 lines never execute. But these lines release the remoteAddr that was associated with the tunnel connection.

I changed the connection by moving the Layer2::stop(); to the end of the method and all seems OK now:

void ConnState::stop()
{
  TRACEPRINTF (t, 8, this, "Stop Conn %d", channel);
  if (type == CT_BUSMONITOR)
    l3->deregisterVBusmonitor(this);
  timeout.stop();
  sendtimeout.stop();
  send_trigger.stop();
  retries = 0;
  TRACEPRINTF (t, 8, this, "---> l3 (this): %d (%d)", l3,this);
  parent->drop_connection (std::static_pointer_cast<ConnState>(shared_from_this()));
  if (remoteAddr && l3)
    {
      l3->release_client_addr(remoteAddr);
      remoteAddr = 0;
    }
  Layer2::stop();
}

I was able to use the Tunnel connection from ETS multiple times as well as from another IP tunnel client without issue.

Hoping this can help.

smurfix commented 7 years ago

Thank you.