permitio / opal

Policy and data administration, distribution, and real-time updates on top of Policy Agents (OPA, Cedar, ...)
https://opal.ac
Apache License 2.0
5.13k stars 178 forks source link

Redis support (Logging of lost connections + random webhook triggers) #286

Open michivi opened 2 years ago

michivi commented 2 years ago

Describe the bug This is basically to report some weird behaviors when using OPAL with Redis:

To Reproduce We had this issue on a testing cloud environment, but we can also reproduce it with Docker compose. The following is an abbreviated version of the official example that we use to test OPAL locally. Every environment variables are provided by the running script. I'm not disclosing their actual values as they're sensitive, but I'll happily discuss them if required (nothing really extraordinary though 🙂).

version: "3.8"
services:
  broadcast_channel:
    image: redis:6.2.7-alpine
    ports:
      - "6379:6379"
  opal_server:
    image: permitio/opal-server:0.1.21
    environment:
      - OPAL_BROADCAST_URI=redis://broadcast_channel:6379
      - UVICORN_NUM_WORKERS=4
      - OPAL_POLICY_REPO_URL=${OPAL_POLICY_REPO_URL}
      - OPAL_POLICY_REPO_SSH_KEY=${OPAL_POLICY_REPO_SSH_KEY}
      - OPAL_POLICY_REPO_MAIN_BRANCH=${OPAL_POLICY_REPO_MAIN_BRANCH}
      - OPAL_POLICY_REPO_POLLING_INTERVAL=0
      - OPAL_AUTH_PUBLIC_KEY=${OPAL_AUTH_PUBLIC_KEY}
      - OPAL_AUTH_PRIVATE_KEY=${OPAL_AUTH_PRIVATE_KEY}
      - OPAL_AUTH_MASTER_TOKEN=${OPAL_AUTH_MASTER_TOKEN}
      - OPAL_DATA_CONFIG_SOURCES=$OPAL_DATA_CONFIG_SOURCES{}
      - OPAL_LOG_FORMAT_INCLUDE_PID=true
      - OPAL_AUTH_JWT_AUDIENCE=https://api.opal.ac/v1/
      - OPAL_AUTH_JWT_ISSUER=https://opal.ac/
    ports:
      - "7002:7002"
    depends_on:
      - broadcast_channel
  opal_client:
    image: permitio/opal-client:0.1.21
    environment:
      - OPAL_SERVER_URL=http://opal_server:7002
      - OPAL_CLIENT_TOKEN=${OPAL_CLIENT_TOKEN}
      - OPAL_POLICY_SUBSCRIPTION_DIRS=${OPAL_POLICY_SUBSCRIPTION_DIRS}
      - OPAL_LOG_FORMAT_INCLUDE_PID=true
      - OPAL_INLINE_OPA_LOG_FORMAT=http
      - OPAL_AUTH_JWT_AUDIENCE=https://api.opal.ac/v1/
      - OPAL_AUTH_JWT_ISSUER=https://opal.ac/
    ports:
      - "7000:7000"
      - "8181:8181"
    depends_on:
      - opal_server
    command: sh -c "./wait-for.sh opal_server:7002 --timeout=20 -- ./start.sh"

Note here that UVICORN_NUM_WORKERS is set to 4. Running this, we'll have the following logs (I'll remove uninteresting lines but can provide them if required):

opal-opal_server-1        | [2022-08-16 17:36:51 +0000] [1] [INFO] Starting gunicorn 20.1.0
opal-opal_server-1        | [2022-08-16 17:36:51 +0000] [1] [INFO] Listening at: http://0.0.0.0:7002 (1)
opal-opal_server-1        | [2022-08-16 17:36:51 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker
opal-opal_server-1        | [2022-08-16 17:36:51 +0000] [8] [INFO] Booting worker with pid: 8
opal-opal_server-1        | [2022-08-16 17:36:51 +0000] [9] [INFO] Booting worker with pid: 9
opal-opal_server-1        | [2022-08-16 17:36:51 +0000] [10] [INFO] Booting worker with pid: 10
opal-opal_server-1        | [2022-08-16 17:36:51 +0000] [11] [INFO] Booting worker with pid: 11
...
opal-opal_server-1        | 2022-08-16T17:37:20.346738+0000 | 10 | opal_server.policy.webhook.api          | INFO  | triggered webhook on repo: git://github.com/XXX/XXX.git
opal-opal_server-1        | 2022-08-16T17:37:20.401177+0000 | 10 | fastapi_websocket_pubsub.event_notifier | INFO  | New subscription {'id': '154d3e84d04944478a70dd9ed83a03fb', 'subscriber_id': 'cbfd39a4958b4dcb82bf0845134a10af', 'topic': '__EventNotifier_ALL_TOPICS__', 'callback': <bound method EventBroadcaster.__broadcast_notifications__ of <fastapi_websocket_pubsub.event_broadcaster.EventBroadcaster object at 0x7f56e1b33370>>, 'notifier_id': None}
opal-opal_server-1        | 2022-08-16T17:37:20.401529+0000 | 10 | fastapi_websocket_pubsub.event_notifier | INFO  | calling subscription callbacks: topic=webhook (ALL_TOPICS), subscription_id=154d3e84d04944478a70dd9ed83a03fb, subscriber_id=cbfd39a4958b4dcb82bf0845134a10af
opal-opal_server-1        | 2022-08-16T17:37:20.402571+0000 | 10 | fastapi_websocket_pubsub.event_broadc...| INFO  | Broadcasting incoming event: {'topic': 'webhook', 'notifier_id': 'cbfd39a4958b4dcb82bf0845134a10af'}
opal-opal_server-1        | 2022-08-16T17:37:20.402995+0000 | 10 | asyncio_redis.connection                | INFO  | Connecting to redis
opal-opal_server-1        | 2022-08-16T17:37:20.406235+0000 | 10 | asyncio_redis.protocol                  | INFO  | Redis connection made
opal-opal_server-1        | 2022-08-16T17:37:20.406913+0000 | 10 | asyncio_redis.connection                | INFO  | Connecting to redis
opal-opal_server-1        | 2022-08-16T17:37:20.411733+0000 | 10 | asyncio_redis.protocol                  | INFO  | Redis connection made
opal-opal_server-1        | 2022-08-16T17:37:20.413964+0000 | 10 | asyncio_redis.protocol                  | INFO  | Redis connection lost
opal-opal_server-1        | 2022-08-16T17:37:20.415721+0000 | 9 | fastapi_websocket_pubsub.event_broadc...| INFO  | Handling incoming broadcast event: {'topics': ['webhook'], 'src': 'cbfd39a4958b4dcb82bf0845134a10af'}
opal-opal_server-1        | 2022-08-16T17:37:20.416836+0000 | 8 | fastapi_websocket_pubsub.event_broadc...| INFO  | Handling incoming broadcast event: {'topics': ['webhook'], 'src': 'cbfd39a4958b4dcb82bf0845134a10af'}
opal-opal_server-1        | 2022-08-16T17:37:20.416765+0000 | 10 | asyncio_redis.protocol                  | INFO  | Redis connection lost
opal-opal_server-1        | 2022-08-16T17:37:20.419045+0000 | 10 | fastapi_websocket_pubsub.event_notifier | INFO  | Removing Subscription of topic='__EventNotifier_ALL_TOPICS__' for subscriber=cbfd39a4958b4dcb82bf0845134a10af
opal-opal_server-1        | 2022-08-16T17:37:20.419894+0000 | 10 | uvicorn.protocols.http.httptools_impl   | INFO  | 172.18.0.1:56092 - "POST /webhook HTTP/1.1" 200
...
opal-opal_server-1        | 2022-08-16T17:37:32.989993+0000 | 11 | opal_server.policy.webhook.api          | INFO  | triggered webhook on repo: git://github.com/XXX/XXX.git
opal-opal_server-1        | 2022-08-16T17:37:32.990901+0000 | 11 | fastapi_websocket_pubsub.event_notifier | INFO  | New subscription {'id': 'cf9ec256f97746f6938b04692bb0c0b0', 'subscriber_id': '7fa214dba1424f2ca691fecf8f6d13d2', 'topic': '__EventNotifier_ALL_TOPICS__', 'callback': <bound method EventBroadcaster.__broadcast_notifications__ of <fastapi_websocket_pubsub.event_broadcaster.EventBroadcaster object at 0x7f56e1b27460>>, 'notifier_id': None}
opal-opal_server-1        | 2022-08-16T17:37:32.991249+0000 | 11 | fastapi_websocket_pubsub.event_notifier | INFO  | calling subscription callbacks: topic=webhook, subscription_id=351f02710fca494b9e83b101c37f301e, subscriber_id=56313d9105614800aa54690a28b8436d
opal-opal_server-1        | 2022-08-16T17:37:32.991507+0000 | 11 | opal_server.policy.watcher.factory      | INFO  | webhook listener triggered
opal-opal_server-1        | 2022-08-16T17:37:32.991952+0000 | 11 | fastapi_websocket_pubsub.event_notifier | INFO  | calling subscription callbacks: topic=webhook (ALL_TOPICS), subscription_id=cf9ec256f97746f6938b04692bb0c0b0, subscriber_id=7fa214dba1424f2ca691fecf8f6d13d2
opal-opal_server-1        | 2022-08-16T17:37:32.992189+0000 | 11 | fastapi_websocket_pubsub.event_broadc...| INFO  | Broadcasting incoming event: {'topic': 'webhook', 'notifier_id': '7fa214dba1424f2ca691fecf8f6d13d2'}
opal-opal_server-1        | 2022-08-16T17:37:32.992703+0000 | 11 | asyncio_redis.connection                | INFO  | Connecting to redis
opal-opal_server-1        | 2022-08-16T17:37:32.994039+0000 | 11 | opal_common.sources.git_policy_source   | INFO  | Pulling changes from remote: 'origin'
opal-opal_server-1        | 2022-08-16T17:37:34.175476+0000 | 11 | opal_common.sources.git_policy_source   | INFO  | No new commits: HEAD is at 'XXX'
opal-opal_server-1        | 2022-08-16T17:37:34.177091+0000 | 11 | asyncio_redis.protocol                  | INFO  | Redis connection made
opal-opal_server-1        | 2022-08-16T17:37:34.177797+0000 | 11 | asyncio_redis.connection                | INFO  | Connecting to redis
opal-opal_server-1        | 2022-08-16T17:37:34.179344+0000 | 11 | asyncio_redis.protocol                  | INFO  | Redis connection made
opal-opal_server-1        | 2022-08-16T17:37:34.180968+0000 | 11 | asyncio_redis.protocol                  | INFO  | Redis connection lost
opal-opal_server-1        | 2022-08-16T17:37:34.181405+0000 | 9 | fastapi_websocket_pubsub.event_broadc...| INFO  | Handling incoming broadcast event: {'topics': ['webhook'], 'src': '7fa214dba1424f2ca691fecf8f6d13d2'}
opal-opal_server-1        | 2022-08-16T17:37:34.181499+0000 | 8 | fastapi_websocket_pubsub.event_broadc...| INFO  | Handling incoming broadcast event: {'topics': ['webhook'], 'src': '7fa214dba1424f2ca691fecf8f6d13d2'}
opal-opal_server-1        | 2022-08-16T17:37:34.182022+0000 | 11 | asyncio_redis.protocol                  | INFO  | Redis connection lost
opal-opal_server-1        | 2022-08-16T17:37:34.183110+0000 | 11 | fastapi_websocket_pubsub.event_notifier | INFO  | Removing Subscription of topic='__EventNotifier_ALL_TOPICS__' for subscriber=7fa214dba1424f2ca691fecf8f6d13d2
opal-opal_server-1        | 2022-08-16T17:37:34.184010+0000 | 11 | uvicorn.protocols.http.httptools_impl   | INFO  | 172.18.0.1:56164 - "POST /webhook HTTP/1.1" 200
...

We can see several things here:

As suggested in the Slack thread, setting UVICORN_NUM_WORKERS to 1 seems to resolve the second point (the webhook listener is always triggered afterward). But the Redis connection is still reported as lost.

Just as an experiment, I tried publishing some messages in the Redis instance (hello, still with a single worker) and ended up with the following logs:

opal-opal_server-1        | 2022-08-16T17:41:33.355978+0000 | 8 | opal_server.policy.webhook.api          | INFO  | triggered webhook on repo: git://github.com/XXX/XXX.git
opal-opal_server-1        | 2022-08-16T17:41:33.357930+0000 | 8 | fastapi_websocket_pubsub.event_notifier | INFO  | calling subscription callbacks: topic=webhook, subscription_id=238cb9d922044d7bb61894811f428f54, subscriber_id=a9a7c881851342a3b6d3ca84a4a8a86d
opal-opal_server-1        | 2022-08-16T17:41:33.358152+0000 | 8 | opal_server.policy.watcher.factory      | INFO  | webhook listener triggered
opal-opal_server-1        | 2022-08-16T17:41:33.358666+0000 | 8 | fastapi_websocket_pubsub.event_notifier | INFO  | calling subscription callbacks: topic=webhook (ALL_TOPICS), subscription_id=0eca665b7b41433ba953c4cfc4eff9f6, subscriber_id=63db1ff04fcd4dc2a1e1e3a7292a73e7
opal-opal_server-1        | 2022-08-16T17:41:33.359060+0000 | 8 | fastapi_websocket_pubsub.event_broadc...| INFO  | Broadcasting incoming event: {'topic': 'webhook', 'notifier_id': '63db1ff04fcd4dc2a1e1e3a7292a73e7'}
opal-opal_server-1        | 2022-08-16T17:41:33.359783+0000 | 8 | asyncio_redis.connection                | INFO  | Connecting to redis
opal-opal_server-1        | 2022-08-16T17:41:33.361329+0000 | 8 | opal_common.sources.git_policy_source   | INFO  | Pulling changes from remote: 'origin'
opal-opal_server-1        | 2022-08-16T17:41:34.543173+0000 | 8 | opal_common.sources.git_policy_source   | INFO  | No new commits: HEAD is at 'XXXX'
opal-opal_server-1        | 2022-08-16T17:41:34.545514+0000 | 8 | asyncio_redis.protocol                  | INFO  | Redis connection made
opal-opal_server-1        | 2022-08-16T17:41:34.546808+0000 | 8 | asyncio_redis.connection                | INFO  | Connecting to redis
opal-opal_server-1        | 2022-08-16T17:41:34.548838+0000 | 8 | asyncio_redis.protocol                  | INFO  | Redis connection made
opal-opal_server-1        | 2022-08-16T17:41:34.551641+0000 | 8 | asyncio_redis.protocol                  | INFO  | Redis connection lost
opal-opal_server-1        | 2022-08-16T17:41:34.552786+0000 | 8 | asyncio_redis.protocol                  | INFO  | Redis connection lost
opal-opal_server-1        | 2022-08-16T17:41:34.554434+0000 | 8 | uvicorn.protocols.http.httptools_impl   | INFO  | 172.18.0.1:56330 - "POST /webhook HTTP/1.1" 200
opal-opal_server-1        | 2022-08-16T17:42:51.285310+0000 | 8 | fastapi_websocket_pubsub.event_broadc...| INFO  | Handling incoming broadcast event: {'topics': ['hello'], 'src': '5bc185a3afdd4bd3a1caedc203f414ce'}
opal-opal_server-1        | 2022-08-16T17:42:52.746649+0000 | 8 | fastapi_websocket_pubsub.event_broadc...| INFO  | Handling incoming broadcast event: {'topics': ['hello'], 'src': 'ce5dcb29523b42528c89031c09078e94'}
opal-opal_server-1        | 2022-08-16T17:43:00.656252+0000 | 8 | fastapi_websocket_pubsub.event_broadc...| INFO  | Handling incoming broadcast event: {'topics': ['hello'], 'src': 'f2035dc737f24563ad01af799bc07a53'}
opal-opal_server-1        | 2022-08-16T17:43:02.443805+0000 | 8 | opal_server.policy.webhook.api          | INFO  | triggered webhook on repo: git://github.com/XXX/XXX.git
opal-opal_server-1        | 2022-08-16T17:43:02.444774+0000 | 8 | fastapi_websocket_pubsub.event_notifier | INFO  | calling subscription callbacks: topic=webhook, subscription_id=238cb9d922044d7bb61894811f428f54, subscriber_id=a9a7c881851342a3b6d3ca84a4a8a86d
opal-opal_server-1        | 2022-08-16T17:43:02.444923+0000 | 8 | opal_server.policy.watcher.factory      | INFO  | webhook listener triggered
opal-opal_server-1        | 2022-08-16T17:43:02.445497+0000 | 8 | fastapi_websocket_pubsub.event_notifier | INFO  | calling subscription callbacks: topic=webhook (ALL_TOPICS), subscription_id=0eca665b7b41433ba953c4cfc4eff9f6, subscriber_id=63db1ff04fcd4dc2a1e1e3a7292a73e7
opal-opal_server-1        | 2022-08-16T17:43:02.446015+0000 | 8 | fastapi_websocket_pubsub.event_broadc...| INFO  | Broadcasting incoming event: {'topic': 'webhook', 'notifier_id': '63db1ff04fcd4dc2a1e1e3a7292a73e7'}
opal-opal_server-1        | 2022-08-16T17:43:02.446760+0000 | 8 | asyncio_redis.connection                | INFO  | Connecting to redis
opal-opal_server-1        | 2022-08-16T17:43:02.447727+0000 | 8 | opal_common.sources.git_policy_source   | INFO  | Pulling changes from remote: 'origin'
opal-opal_server-1        | 2022-08-16T17:43:03.727878+0000 | 8 | opal_common.sources.git_policy_source   | INFO  | No new commits: HEAD is at 'XXXX'
opal-opal_server-1        | 2022-08-16T17:43:03.729302+0000 | 8 | asyncio_redis.protocol                  | INFO  | Redis connection made
opal-opal_server-1        | 2022-08-16T17:43:03.729719+0000 | 8 | asyncio_redis.connection                | INFO  | Connecting to redis
opal-opal_server-1        | 2022-08-16T17:43:03.730882+0000 | 8 | asyncio_redis.protocol                  | INFO  | Redis connection made
opal-opal_server-1        | 2022-08-16T17:43:03.732462+0000 | 8 | asyncio_redis.protocol                  | INFO  | Redis connection lost
opal-opal_server-1        | 2022-08-16T17:43:03.732879+0000 | 8 | asyncio_redis.protocol                  | INFO  | Redis connection lost
opal-opal_server-1        | 2022-08-16T17:43:03.734165+0000 | 8 | uvicorn.protocols.http.httptools_impl   | INFO  | 172.18.0.1:56350 - "POST /webhook HTTP/1.1" 200

So it seems that the Redis connection is actually still alive, and OPAL is still listening to it. It only considers the connection lost when it receives a webhook notification. Though it does so after reading the incoming notification so at the very least the webhook notification is taken into account.

Expected behavior Is Redis officially supported by OPAL? I've seen few references to it. It is supported by the broadcaster package and the example docker-compose.yaml file suggests that any backend supported by broadcaster is supported by OPAL.

If Redis is officially supported, then I guess the expected behavior would be:

OPAL version

Have I missed something that might explain all of this? When testing the fastapi_websocket_pubsub, I didn't have Redis connection issue / message.

Thanks in advance for your help 🙂

asafc commented 2 years ago

Hey @michivi thanks for reporting this bug.

OPAL uses and supports Redis via the python broadcaster package. We are looking to move away from or fork broadcaster since it's no longer actively maintained (that we can tell).

@roekatz can you please take a look? we can fix that in our own fork of broadcaster.