saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Install Salt from the Salt package repositories here:
https://docs.saltproject.io/salt/install-guide/en/latest/
Apache License 2.0
14.19k stars 5.48k forks source link

Race condition: Inconsistent output when having multiple websocket clients on /events endpoint #35798

Open JPT580 opened 8 years ago

JPT580 commented 8 years ago

Description of Issue/Question

When more than one websocket connection to the /events API endpoint is established, these websocket connections receive different output. According to my understanding of this endpoint, an occurring event is supposed to be delivered to all websocket connections, but instead only one of the active connections actually receives the event data.

After digging through the code i came across a part in salt.netapi.rest_tornado.saltnado_websockets which explains it all. When a message is received through the websocket connection from the client (which is supposed to say 'websocket client ready'), the on_message method is invoked as a @tornado.gen.coroutine. Within that method, this happens:

while True:
    try:
        event = yield self.application.event_listener.get_event(self)
        self.write_message(u'data: {0}\n\n'.format(json.dumps(event)))

Obviously, fetching the event like this will consume it (that event is gone for everybody else then). Any other coroutine from a different websocket connection will not be able to access that same event in order to output it.

Approach to fix it

Instead of this, i would recommend to set up one coroutine that is responsible for fetching the events from the salt event bus. For each event fetched, this routine should iterate over all currently open websocket connections and feed the event data into each of them using a simple method call (or using a pipe?). This way, one could get rid of that while True loop and as a bonus have consistent event output across all open websocket connections.

Setup

Since the code looks the same on the master branch, i strongly assume installing salt and setting it up for rest_tornado will do.

Steps to Reproduce Issue

Then simply connect to websocket clients to the /events endpoint of salt-api and fire any event onto the bus.

Versions Report for affected salt-master that runs the salt-api

root@salt:~# salt --versions-report
Salt Version:
           Salt: 2016.3.2

Dependency Versions:
           cffi: 0.8.6
       cherrypy: 3.5.0
       dateutil: 2.2
          gitdb: 0.5.4
      gitpython: 0.3.2 RC1
          ioflo: Not Installed
         Jinja2: 2.7.3
        libgit2: 0.24.0
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.2
   mysql-python: 1.2.3
      pycparser: 2.10
       pycrypto: 2.6.1
         pygit2: 0.24.0
         Python: 2.7.9 (default, Mar  1 2015, 12:57:24)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 14.4.0
           RAET: Not Installed
          smmap: 0.8.2
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.5

System Versions:
           dist: debian 8.5 
        machine: x86_64
        release: 3.16.0-4-amd64
         system: Linux
        version: debian 8.5 
Ch3LL commented 8 years ago

@JPT580 looks like we need to clean up this race condition so someone is able to accurately query that /events endpoint with multiple connections. Thanks for the heads up.

fechnert commented 8 years ago

This also affects the /all_events websocket endpoint.

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

JPT580 commented 6 years ago

No, sorry. The issue still remains. The affected code still does the same thing. See https://github.com/saltstack/salt/blob/develop/salt/netapi/rest_tornado/saltnado_websockets.py#L354

stale[bot] commented 6 years ago

Thank you for updating this issue. It is no longer marked as stale.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.

JPT580 commented 4 years ago

No, sorry. The issue still remains. The affected code still does the same thing. See https://github.com/saltstack/salt/blob/develop/salt/netapi/rest_tornado/saltnado_websockets.py#L354

stale[bot] commented 4 years ago

Thank you for updating this issue. It is no longer marked as stale.

twangboy commented 1 year ago

Looks like this is still an issue