vrk-kpa / xroad-joint-development

Unmaintained repository. Development moved to: https://github.com/nordic-institute/X-Road-development
19 stars 8 forks source link

BUG: TCP (ESTABLISHED) connections pile up on 5500 in fault situations #131

Closed JyrgenSuvalov closed 7 years ago

JyrgenSuvalov commented 7 years ago

Affected components: - Security Server Affected documentation: - Estimated delivery: - External reference: - https://jira.ria.ee/browse/XTE-307

Problem When the producer server cannot reach the adapter server, it generates faults and connections on 5500 on the clientproxy are left hanging. As a result, antidos activates. If antidos is disabled, you get "too many open files" in different connections (e.g. postgres, adapterservers). Also affects memory resources.

This is because the serverproxy does not include a a HTTP Connection: close header in some fault situations and the clientproxy leaves the connection hanging.

Acceptance criteria

olli-lindgren commented 7 years ago

If I understand this correctly (can't access the JIRA linked), the problem is the connection between client and server proxy.

Note that for connection pooling to work, the header should not be included at all. There was a configuration option added to toggle this on/off. Pooling is off by default in the base package, but on in the Finnish package since 6.9.0.

More on it here: https://confluence.csc.fi/display/PVAYLADEV/Getting+the+connection+pool+to+reuse+existing+connections+between+security+servers. If you can't access it, I can send you a PDF of it, just let me know where to put it.

In addition, an option was added for the serverproxy to close idle connections after awhile. The finnish default is 120 seconds, when implementing it, I left the global default at 0.

https://github.com/ria-ee/X-Road/blob/develop/doc/Manuals/ug-syspar_x-road_v6_system_parameters.md#security-server-system-parameters

# default
server-connector-max-idle-time=0

# finnish default
server-connector-max-idle-time=120000

Can this problem be fixed just by changing the global default to non-zero?

If you choose to change how the Close header is added, please make sure it conforms to the system property SystemProperties.isEnableClientProxyPooledConnectionReuse(), that is pool-enable-connection-reuse=true then the close header should not be added.

JyrgenSuvalov commented 7 years ago

As I understand, when pooling is off, HTTP Close header should still be included. So this should still be fixed and indeed so in accordance with your changes and the pooling option.

olli-lindgren commented 7 years ago

Yeah, it is intended that it works as before (i.e. sends the header) if the pooling is off. The bug should absolutely be fixed since there likely are clients that want to immediately close the connections.

Note that sending the header with the server-connector-max-idle-time left to the default 0 leaves the responsibility of closing the connection entirely to the ClientProxy (which it might not do, ever, for whatever reason). If that's fine for you, leave it be. We opted to use a non-infinite (non-zero) value in the Finnish package, as I described above. You may also want to do that. Currently we don't see a scenario where the ServerProxy should wait forever for the connections to close if the ClientProxy does not close them.

VitaliStupin commented 7 years ago

Released in version 6.12.0