ucphhpc / docker-migrid

Containerized MiG
GNU General Public License v2.0
3 stars 6 forks source link

WebDAVS crash with latest cheroot release #57

Open jonasbardino opened 5 months ago

jonasbardino commented 5 months ago

As AU reported (thanks @Bjarke42) the WebDAVS service started crashing with errors deep in the cheroot web server code when deployed with a recent docker build. We were able to reproduce the same error with a fresh build and e.g. the testssl.sh tool (https://testssl.sh).

The relevant logs from the CentOS 7 container crash would look like

2024-04-24 12:02:37,952 grid_webdavs:run:1608 INFO Starting SessionExpire thread: #140454816675584
2024-04-24 12:03:21,022 grid_webdavs:wrap:444 ERROR SSL/TLS wrap of ('192.168.0.2', 54800) failed unexpectedly: EOF occurred in violation of protocol (_ssl.c:877)
2024-04-24 12:03:21,054 grid_webdavs:wrap:429 WARNING SSL/TLS wrap of ('192.168.0.2', 54802) failed: [SSL: NO_SHARED_CIPHER] no shared cipher (_ssl.c:877)
2024-04-24 12:03:21,054 grid_webdavs:wrap:442 DEBUG SSL/TLS got invalid request: [SSL: NO_SHARED_CIPHER] no shared cipher (_ssl.c:877)
2024-04-24 12:03:21,054 grid_webdavs:wrap:444 ERROR SSL/TLS wrap of ('192.168.0.2', 54802) failed unexpectedly: [SSL: NO_SHARED_CIPHER] no shared cipher (_ssl.c:877)
2024-04-24 12:03:21,154 grid_webdavs:run:2009 ERROR server thread failed: 'NoneType' object has no attribute '_decref_socketios'
2024-04-24 12:03:21,154 grid_webdavs:stop:1622 INFO Stopping SessionExpire Thread: #140454816675584
2024-04-24 12:03:21,996 grid_webdavs:<module>:2140 ERROR exiting on unexpected exception: 'NoneType' object has no attribute '_decref_socketios'
2024-04-24 12:03:21,997 grid_webdavs:<module>:2141 INFO Traceback (most recent call last):
  File "/home/mig/mig/server/grid_webdavs.py", line 2134, in <module>
    run(configuration)
  File "/home/mig/mig/server/grid_webdavs.py", line 2001, in run
    server.start()
  File "/usr/local/lib/python3.6/site-packages/cheroot/server.py", line 1845, in start
    self.serve()
  File "/usr/local/lib/python3.6/site-packages/cheroot/server.py", line 1833, in serve
    raise self.interrupt
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.6/site-packages/cheroot/workers/threadpool.py", line 119, in run
    self._process_connections_until_interrupted()
  File "/usr/local/lib/python3.6/site-packages/cheroot/workers/threadpool.py", line 216, in _process_connections_until_interrupted
    conn.close()
  File "/usr/local/lib/python3.6/site-packages/cheroot/server.py", line 1361, in close
    self.rfile.close()
  File "/usr/lib64/python3.6/_pyio.py", line 778, in close
    self.raw.close()
  File "/usr/lib64/python3.6/socket.py", line 656, in close
    self._sock._decref_socketios()
AttributeError: 'NoneType' object has no attribute '_decref_socketios'

The crash was very similar on Rocky 8 and on Rocky 9 with a much more recent python3.9.

We investigated and traced it back to something related to the recent cheroot 10.0.1 release (https://pypi.org/project/cheroot/#history).

We have forced Dockerfile.* to use an earlier version for now to work around the problem and need to look further into the code and cause to decide if it's a cheroot or migrid bug.