permitio / opal

Policy and data administration, distribution, and real-time updates on top of Policy Agents (OPA, Cedar, ...)
https://opal.ac
Apache License 2.0
4.49k stars 170 forks source link

OPAL Server stops polling policy updates when utilizing AWS ElastiCache Redis #685

Open ojecborec opened 1 week ago

ojecborec commented 1 week ago

First I've configured OPAL Server to utilize local Redis instance (running as Docker container)

OPAL_BROADCAST_URI=redis://broadcast-channel:6379
OPAL_POLICY_REPO_POLLING_INTERVAL=5

and confirmed that it works

Pulling changes from remote: 'origin'
No new commits: HEAD is at 'e0ba61f248af601f8628cd5154419b0aee104a84'
...
Pulling changes from remote: 'origin'
No new commits: HEAD is at 'e0ba61f248af601f8628cd5154419b0aee104a84'

When policy is updated and changes are pushed to Git repository I see that OPAL Server handles this update correctly. So far, so good.

Pulling changes from remote: 'origin'
Found new commits: old HEAD was 'e0ba61f248af601f8628cd5154419b0aee104a84', new HEAD is '0a923fe31895f36da833b8fbaf007090548205d9'
Notifying other side: subscription={'id': 'e80bbdd079414c248d70d4ae9e83051c', ...
Broadcasting incoming event: {'topic': 'policy:.', 'notifier_id': 'c09758ab8daf422a825d77d7a284ddf4'}
Connecting to redis
Redis connection made
Redis connection lost

Then I've configured OPAL Server to utilize Amazon ElastiCache Redis instance

OPAL_BROADCAST_URI=rediss://:secret@foo...cache.amazonaws.com:6379
OPAL_POLICY_REPO_POLLING_INTERVAL=5

and confirmed that it works

Pulling changes from remote: 'origin'
No new commits: HEAD is at '0a923fe31895f36da833b8fbaf007090548205d9'
...
Pulling changes from remote: 'origin'
No new commits: HEAD is at '0a923fe31895f36da833b8fbaf007090548205d9'

When policy is updated and changes are pushed to Git repository I see that OPAL Server handles this update

Pulling changes from remote: 'origin'
Found new commits: old HEAD was '0a923fe31895f36da833b8fbaf007090548205d9', new HEAD is '8e350fe985d9f2e3180ec1053f2d5ff305fd4616'
Notifying other side: subscription={'id': 'e33a92b2feb949999783da42787d462f', ...
Broadcasting incoming event: {'topic': 'policy:.', 'notifier_id': '56416267245b487d98ba6237441a321b'}
Connecting to redis
Redis connection made 

but then I do not see Pulling changes from remote: 'origin' messages anymore. The only difference I can see in logs is Redis connection lost with local Redis vs. not seeing this message at all when leveraging Amazon ElastiCache. I've tried versions 6 and 7 but I do not think that is the case. Is there anything I can do to make it works?

ojecborec commented 1 week ago

Going down the rabbit hole the await self._endpoint.publish(topics=topics, data=data) call in the ServerSideTopicPublisher._publish_impl method never returns.

orweis commented 1 week ago

Hi, thanks for reporting. Not sure what the issue is - but might be this can be solved with a different configuration of Redis or the Redis client. I'd recommend trying to play with the broadcaster lib directly to experiment with why/when this fails.

ojecborec commented 1 week ago

I've tried. It works with the example app. However when trying with OPAL Server the self._endpoint.publish(topics=topics, data=data) never returns (not even for keep alive messages).

ojecborec commented 1 week ago

Connection is TLS and password protected, i.e.

rediss://:password@whatever.cache.amazonaws.com:6379

orweis commented 1 week ago

Very odd. Especially as that code seems to just queue another task.

Seeing as I don't have a pointers for you - what I can suggest is maybe using a different backbone pub/sub - e.g. Postgres listen notify instead of ElasticCache Redis

ojecborec commented 1 week ago

RDS Postgres seems to be working.