permitio / opal

Policy and data administration, distribution, and real-time updates on top of Policy Agents (OPA, Cedar, ...)
https://opal.ac
Apache License 2.0
4.14k stars 157 forks source link

ERROR | Critical: OPA health-check policy is enabled, but cannot find policy #161

Closed hongbo-miao closed 2 years ago

hongbo-miao commented 2 years ago

I followed How to use data update callbacks and OPA healthcheck policy and succeed running the docker-compose demo.

Now I am trying it in a Kubernetes cluster. My opal-server has set up JWT, and I generated one and added to opal-client. They work well and can communicate with each other.

First here is the opal-client log when it works well:

[2021-10-18 23:26:16 +0000] [1] [INFO] Starting gunicorn 20.1.0
[2021-10-18 23:26:16 +0000] [1] [INFO] Listening at: http://0.0.0.0:7000 (1)
[2021-10-18 23:26:16 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2021-10-18 23:26:16 +0000] [9] [INFO] Booting worker with pid: 9
2021-10-18T23:26:17.493894+0000 | opal_common.fetcher.fetcher_register    | INFO  | Loading FetcherProvider 'FastApiRpcFetchProvider' found at: <class 'opal_common.fetcher.providers.fastapi_rpc_fetch_provider.FastApiRpcFetchProvider'>
2021-10-18T23:26:17.494433+0000 | opal_common.fetcher.fetcher_register    | INFO  | Loading FetcherProvider 'HttpFetchProvider' found at: <class 'opal_common.fetcher.providers.http_fetch_provider.HttpFetchProvider'>
2021-10-18T23:26:17.546170+0000 | opal_common.fetcher.fetcher_register    | INFO  | Loading FetcherProvider 'PostgresFetchProvider' found at: <class 'opal_fetcher_postgres.provider.PostgresFetchProvider'>
2021-10-18T23:26:17.546534+0000 | opal_common.fetcher.fetcher_register    | INFO  | Fetcher Register loaded
2021-10-18T23:26:17.548360+0000 | opal_client.callbacks.register          | INFO  | Callbacks register loaded
2021-10-18T23:26:17.548847+0000 | opal_client.client                      | INFO  | API authentication disabled (public encryption key was not provided)
2021-10-18T23:26:17.633390+0000 | uvicorn.server                          | INFO  | Started server process [9]
2021-10-18T23:26:17.633652+0000 | uvicorn.lifespan.on                     | INFO  | Waiting for application startup.
2021-10-18T23:26:17.634044+0000 | opal_client.opa.runner                  | INFO  | Launching opa runner
2021-10-18T23:26:17.634298+0000 | uvicorn.lifespan.on                     | INFO  | Application startup complete.
2021-10-18T23:26:17.637493+0000 | opal_client.opa.runner                  | INFO  | Running OPA inline: opa run --server --addr=:8181 --authentication=off --authorization=off --config-file=/data/opa/config.yaml --log-level=info
2021-10-18T23:26:17.661563+0000 | opal_client.opa.logger                  | INFO  | Initializing server. {'addrs': [':8181'], 'diagnostic-addrs': [], 'time': '2021-10-18T23:26:17Z'}
2021-10-18T23:26:17.663061+0000 | opal_client.opa.logger                  | INFO  | Starting decision logger. {'plugin': 'decision_logs', 'time': '2021-10-18T23:26:17Z'}
2021-10-18T23:26:18.646088+0000 | opal_client.opa.runner                  | INFO  | Running OPA initial start callbacks
2021-10-18T23:26:18.646692+0000 | opal_client.policy.updater              | INFO  | Launching policy updater
2021-10-18T23:26:18.647141+0000 | opal_client.data.updater                | INFO  | Launching data updater
2021-10-18T23:26:18.647495+0000 | opal_client.policy.updater              | INFO  | Subscribing to topics: ['policy:.']
2021-10-18T23:26:18.647911+0000 | opal_client.data.updater                | INFO  | Subscribing to topics: ['policy_data']
2021-10-18T23:26:18.648326+0000 | fastapi_websocket_pubsub.pub_sub_client | INFO  | Trying to connect to Pub/Sub server - ws://opal-server-service.hm-opa:7002/ws
2021-10-18T23:26:18.649096+0000 | fastapi_websocket_rpc.websocket_rpc_c...| INFO  | Trying server - ws://opal-server-service.hm-opa:7002/ws
2021-10-18T23:26:18.652890+0000 | fastapi_websocket_pubsub.pub_sub_client | INFO  | Trying to connect to Pub/Sub server - ws://opal-server-service.hm-opa:7002/ws
2021-10-18T23:26:18.653460+0000 | fastapi_websocket_rpc.websocket_rpc_c...| INFO  | Trying server - ws://opal-server-service.hm-opa:7002/ws
2021-10-18T23:26:18.747103+0000 | opal_client.data.updater                | INFO  | Connected to server
2021-10-18T23:26:18.747249+0000 | opal_client.data.updater                | INFO  | Performing data configuration, reason: Initial load
2021-10-18T23:26:18.747568+0000 | opal_client.data.updater                | INFO  | Getting data-sources configuration from 'http://opal-server-service.hm-opa:7002/data/config'
2021-10-18T23:26:18.754995+0000 | opal_client.data.updater                | INFO  | Triggering data update with id: 6f46820d41454c3886b22780f150de3a
2021-10-18T23:26:18.755274+0000 | opal_client.data.updater                | INFO  | Fetching policy data
2021-10-18T23:26:18.756081+0000 | opal_client.data.fetcher                | INFO  | Fetching data from url: postgresql://admin@opa-db-service.hm-opa:40072/opa_db
2021-10-18T23:26:18.778370+0000 | fastapi_websocket_pubsub.pub_sub_client | INFO  | Connected to PubSub server ws://opal-server-service.hm-opa:7002/ws
2021-10-18T23:26:18.779592+0000 | opal_client.policy.updater              | INFO  | Connected to server
2021-10-18T23:26:18.779807+0000 | opal_client.policy.updater              | INFO  | Refetching policy code (full bundle)
2021-10-18T23:26:18.799877+0000 | opal_client.data.updater                | INFO  | Saving fetched data to policy-store: source url='postgresql://admin@opa-db-service.hm-opa:40072/opa_db', destination path='/roles'
2021-10-18T23:26:18.803579+0000 | opal_client.opa.logger                  | INFO  | Received request.    {'client_addr': '[::1]:45030', 'req_id': 1, 'req_method': 'PUT', 'req_path': '/v1/data/roles', 'time': '2021-10-18T23:26:18Z'}
2021-10-18T23:26:18.803980+0000 | opal_client.opa.logger                  | INFO  | Sent response.       {'client_addr': '[::1]:45030', 'req_id': 1, 'req_method': 'PUT', 'req_path': '/v1/data/roles', 'resp_bytes': 0, 'resp_duration': 0.411255, 'resp_status': 204, 'time': '2021-10-18T23:26:18Z'}
2021-10-18T23:26:18.815959+0000 | opal_client.policy.updater              | INFO  | got policy bundle, commit hash: '910d43e7e4da262d838c562ce3ae9910600fe445'
2021-10-18T23:26:18.821297+0000 | opal_client.opa.logger                  | INFO  | Received request.    {'client_addr': '[::1]:45034', 'req_id': 2, 'req_method': 'GET', 'req_path': '/v1/policies', 'time': '2021-10-18T23:26:18Z'}
2021-10-18T23:26:18.822307+0000 | opal_client.opa.logger                  | INFO  | Sent response.       {'client_addr': '[::1]:45034', 'req_id': 2, 'req_method': 'GET', 'req_path': '/v1/policies', 'resp_bytes': 13, 'resp_duration': 0.459031, 'resp_status': 200, 'time': '2021-10-18T23:26:18Z'}
2021-10-18T23:26:18.828864+0000 | opal_client.opa.logger                  | INFO  | Received request.    {'client_addr': '[::1]:45036', 'req_id': 3, 'req_method': 'PUT', 'req_path': '/v1/policies/rbac.rego', 'time': '2021-10-18T23:26:18Z'}
2021-10-18T23:26:18.831522+0000 | opal_client.opa.logger                  | INFO  | Sent response.       {'client_addr': '[::1]:45036', 'req_id': 3, 'req_method': 'PUT', 'req_path': '/v1/policies/rbac.rego', 'resp_bytes': 2, 'resp_duration': 2.207848, 'resp_status': 200, 'time': '2021-10-18T23:26:18Z'}
2021-10-18T23:26:18.832607+0000 | fastapi_websocket_pubsub.pub_sub_client | INFO  | Connected to PubSub server ws://opal-server-service.hm-opa:7002/ws

Now I tried to set up health check.

When I added

- OPAL_SHOULD_REPORT_ON_DATA_UPDATES=True
# we configure a callback to go to a default location in the OPAL server, but you can configure
# a callback to any url you'd like. Each callback is either the url alone, or a tuple of
# (url, HttpFetcherConfig).
# We show here both ways to configure the same endpoint, one of them demonstrate how to
# add extra HTTP headers (the header shown is ignored, only here for example).
- OPAL_DEFAULT_UPDATE_CALLBACKS={"callbacks":["http://opal_server:7002/data/callback_report"]}
# - OPAL_DEFAULT_UPDATE_CALLBACKS={"callbacks":[("http://opal_server:7002/data/callback_report",{"headers":{"X-My-Token":"token"}})]}
# OPAL can load a special policy into OPA that acts as a healthcheck policy (Not directly related to the callback feature).
# This policy defines two opa rules you can query:
# ready rule (POST http://localhost:8181/data/system/opal/ready): signals that OPA is ready to accept authorization queries.
# healthy rule (POST http://localhost:8181/data/system/opal/ready): signals that the last policy and data updates succeeded.
- OPAL_OPA_HEALTH_CHECK_POLICY_ENABLED=True

In my case, it is

OPAL_SHOULD_REPORT_ON_DATA_UPDATES: "True"
OPAL_DEFAULT_UPDATE_CALLBACKS: '{"callbacks":["http://opal-server-service.hm-opa:7002/data/callback_report"]}'
OPAL_OPA_HEALTH_CHECK_POLICY_ENABLED: "True"

However, I got this error in my opal-client log:

[2021-10-18 23:24:07 +0000] [1] [INFO] Starting gunicorn 20.1.0
[2021-10-18 23:24:07 +0000] [1] [INFO] Listening at: http://0.0.0.0:7000 (1)
[2021-10-18 23:24:07 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2021-10-18 23:24:07 +0000] [9] [INFO] Booting worker with pid: 9
2021-10-18T23:24:07.965891+0000 | opal_common.fetcher.fetcher_register    | INFO  | Loading FetcherProvider 'FastApiRpcFetchProvider' found at: <class 'opal_common.fetcher.providers.fastapi_rpc_fetch_provider.FastApiRpcFetchProvider'>
2021-10-18T23:24:07.966694+0000 | opal_common.fetcher.fetcher_register    | INFO  | Loading FetcherProvider 'HttpFetchProvider' found at: <class 'opal_common.fetcher.providers.http_fetch_provider.HttpFetchProvider'>
2021-10-18T23:24:08.012902+0000 | opal_common.fetcher.fetcher_register    | INFO  | Loading FetcherProvider 'PostgresFetchProvider' found at: <class 'opal_fetcher_postgres.provider.PostgresFetchProvider'>
2021-10-18T23:24:08.013219+0000 | opal_common.fetcher.fetcher_register    | INFO  | Fetcher Register loaded
2021-10-18T23:24:08.015348+0000 | opal_client.callbacks.register          | INFO  | Callbacks register loaded
2021-10-18T23:24:08.015979+0000 | opal_client.client                      | INFO  | API authentication disabled (public encryption key was not provided)
2021-10-18T23:24:08.087297+0000 | uvicorn.server                          | INFO  | Started server process [9]
2021-10-18T23:24:08.088012+0000 | uvicorn.lifespan.on                     | INFO  | Waiting for application startup.
2021-10-18T23:24:08.088728+0000 | opal_client.opa.runner                  | INFO  | Launching opa runner
2021-10-18T23:24:08.089187+0000 | uvicorn.lifespan.on                     | INFO  | Application startup complete.
2021-10-18T23:24:08.091001+0000 | opal_client.opa.runner                  | INFO  | Running OPA inline: opa run --server --addr=:8181 --authentication=off --authorization=off --config-file=/data/opa/config.yaml --log-level=info
2021-10-18T23:24:08.109728+0000 | opal_client.opa.logger                  | INFO  | Initializing server. {'addrs': [':8181'], 'diagnostic-addrs': [], 'time': '2021-10-18T23:24:08Z'}
2021-10-18T23:24:08.110709+0000 | opal_client.opa.logger                  | INFO  | Starting decision logger. {'plugin': 'decision_logs', 'time': '2021-10-18T23:24:08Z'}
2021-10-18T23:24:09.098182+0000 | opal_client.opa.runner                  | INFO  | Running OPA initial start callbacks
2021-10-18T23:24:09.098529+0000 | opal_client.client                      |ERROR  | Critical: OPA health-check policy is enabled, but cannot find policy at /usr/local/lib/python3.8/site-packages/opal_client-0.1.15-py3.8.egg/opal_client/opa/healthcheck/opal.rego
2021-10-18T23:24:09.098722+0000 | opal_client.client                      |CRITICAL | healthcheck policy enabled but could not be initialized!
2021-10-18T23:24:09.098798+0000 | opal_client.client                      | INFO  | triggering shutdown with SIGTERM...
2021-10-18T23:24:09.197616+0000 | uvicorn.server                          | INFO  | Shutting down
[2021-10-18 23:24:09 +0000] [9] [INFO] Error while closing socket [Errno 9] Bad file descriptor
2021-10-18T23:24:09.298780+0000 | uvicorn.lifespan.on                     | INFO  | Waiting for application shutdown.
2021-10-18T23:24:09.299108+0000 | opal_client.client                      | INFO  | stopping background tasks...
2021-10-18T23:24:09.299252+0000 | opal_client.opa.runner                  | INFO  | Stopping opa runner
2021-10-18T23:24:09.299325+0000 | opal_client.opa.runner                  | INFO  | Stopping OPA
2021-10-18T23:24:09.299944+0000 | opal_client.opa.logger                  | INFO  | Shutting down...     {'time': '2021-10-18T23:24:09Z'}
2021-10-18T23:24:09.301328+0000 | opal_client.opa.logger                  | INFO  | Server shutdown.     {'time': '2021-10-18T23:24:09Z'}
2021-10-18T23:24:09.308254+0000 | opal_client.opa.logger                  | INFO  | Stopping decision logger. {'plugin': 'decision_logs', 'time': '2021-10-18T23:24:09Z'}
2021-10-18T23:24:09.308705+0000 | opal_client.opa.logger                  | INFO  | Flushing decision logs. {'plugin': 'decision_logs', 'time': '2021-10-18T23:24:09Z'}
2021-10-18T23:24:09.309221+0000 | opal_client.opa.logger                  | INFO  | All decisions in buffer uploaded. {'plugin': 'decision_logs', 'time': '2021-10-18T23:24:09Z'}
2021-10-18T23:24:09.311407+0000 | opal_client.opa.runner                  | INFO  | OPA exited with return code: 0
2021-10-18T23:24:10.300139+0000 | opal_client.client                      | INFO  | trying to shutdown DataUpdater and PolicyUpdater gracefully...
2021-10-18T23:24:10.300612+0000 | opal_client.data.updater                | INFO  | Stopping data updater
2021-10-18T23:24:10.300781+0000 | opal_client.policy.updater              | INFO  | Stopping policy updater
2021-10-18T23:24:10.301351+0000 | uvicorn.lifespan.on                     | INFO  | Application shutdown complete.
2021-10-18T23:24:10.301698+0000 | uvicorn.server                          | INFO  | Finished server process [9]
[2021-10-18 23:24:10 +0000] [9] [INFO] Worker exiting (pid: 9)
<green>{time}</green> | <blue>{name: <40}</blue>|<level>{level:^6} | {message}</level>
{exception}

I also tried setting up headers, as my opal-server has set up the JWT. However, still no luck.

OPAL_DEFAULT_UPDATE_CALLBACKS: '{"callbacks":["http://opal-server-service.hm-opa:7002/data/callback_report",{"headers":{"secret-key":"looooogJWTToken"}})]}'

Any clue? Thanks!

asafc commented 2 years ago

hey @Hongbo-Miao,

this is the issue: OPAL cannot find the healthcheck policy file.

Critical: OPA health-check policy is enabled, but cannot find policy at /usr/local/lib/python3.8/site-packages/opal_client-0.1.15-py3.8.egg/opal_client/opa/healthcheck/opal.rego

As to why, i am suspecting it's because i did not add the policy file to the python package manifest. I will make sure and if so - i will fix it in the next version (which hopefully will be submitted tonight).

Did it work for you in docker compose?

hongbo-miao commented 2 years ago

@asafc thanks! Yup, the docker compose demo works for me.

asafc commented 2 years ago

Hey @Hongbo-Miao, i think i found the issue and it's indeed the manifest. What's weird, it seems the OPAL running in your k8s is being run from a different path. I suspect that you invoke OPAL differently in your k8s config.

I will fix it in the morning before releasing 0.1.16

asafc commented 2 years ago

Should be fixed in 0.1.16

hongbo-miao commented 2 years ago

Confirmed, the healthcheck has been fixed in 0.1.16! Thanks!