open62541 / open62541

Open source implementation of OPC UA (OPC Unified Architecture) aka IEC 62541 licensed under Mozilla Public License v2.0
http://open62541.org
Mozilla Public License 2.0
2.59k stars 1.24k forks source link

Large file descriptors cause crashes on Linux due to usage of FD_SET/select() #6188

Open ravinrabbid opened 10 months ago

ravinrabbid commented 10 months ago

Description

open62541 as of version 1.3.8 uses select() to wait on sockets which is limited to file descriptors <1024 on Linux. In larger scale applications where file descriptors are likely to grow larger, this leads to undefined behavior and crashes.

Background Information / Reproduction Steps

To reproduce, open a lot of file descriptors (>1024) before opening a client connection with open62541. You might need to increase the limit for open files on your system first. The application will cause a segmentation fault upon connection.

This is because the behavior for FD_SET is undefined for fds >= FD_SETSIZE on Linux, which in practice will lead to stack corruption.

Excerpt of the core dump

#0  0x00007fbc5f4a154c __pthread_kill_implementation (libc.so.6 + 0xa154c)
#1  0x00007fbc5f454d06 raise (libc.so.6 + 0x54d06)
#2  0x00007fbc5f4287f3 abort (libc.so.6 + 0x287f3)
#3  0x00007fbc5f429130 __libc_message.cold (libc.so.6 + 0x29130)
#4  0x00007fbc5f55d3bb __fortify_fail (libc.so.6 + 0x15d3bb)
#5  0x00007fbc5f55be06 __chk_fail (libc.so.6 + 0x15be06)
#6  0x00007fbc5f55d30b __fdelt_warn (libc.so.6 + 0x15d30b)
#7  0x00007fbc5c527988 UA_ClientConnectionTCP_poll (libopen62541.so.1 + 0xde988)
#8  0x00007fbc5c4e6466 connectIterate (libopen62541.so.1 + 0x9d466)
#9  0x00007fbc5c4e483f UA_Client_run_iterate (libopen62541.so.1 + 0x9b83f)
#10 0x00007fbc5c4e8727 connectSync (libopen62541.so.1 + 0x9f727)
#11 0x00007fbc5c4ea162 UA_Client_getEndpoints (libopen62541.so.1 + 0xa1162)
[...]

Used CMake options:

cmake -DUA_ENABLE_ENCRYPTION=ON -DUA_ENABLE_ENCRYPTION_OPENSSL=ON -DUA_ENABLE_UNIT_TESTS_MEMCHECK=ON -DBUILD_SHARED_LIBS=ON

Checklist

Please provide the following information:

V-Strassheim commented 10 months ago

Hi, is it possible for you to switch to 1.4(rc) branch? Epoll is already implemented there for the event loop, that no longer has the limitations on Linux based systems. This would also bring further advantages in terms of processing speed.

Otherwise it would be possible to switch from select to poll. However, this is associated with some effort and I'm not sure whether the limitation of fd set size is kind of an exceptional case and is therefore more likely to be addressed with a lower priority.

ravinrabbid commented 10 months ago

Thanks for your reply! We've indeed already patched the parts we need for our use-case to use poll() instead.

Good to know that this is already addressed for 1.4, it still might be worth looking into at least checking the file descriptor against FD_SETSIZE for 1.3 and provide an error. While there are probably not that many people running into this, problems caused by this can be very unspecific and hard to debug.

jpfr commented 10 months ago

We can integrate a patch to 1.3 if It is provided. And small enough that we can exclude new bugs being introduced.

The change in 1.4 has been quit extensive under the hood. The effort from the maintainers goes to the “new reality” that already supports epoll.