open-policy-agent / opa

Open Policy Agent (OPA) is an open source, general-purpose policy engine.
https://www.openpolicyagent.org
Apache License 2.0
9.66k stars 1.34k forks source link

OPA rules are not enforcing through envoy plugin - OPA is running as service instead of side car in a k8s cluster on the same node where application and envoy (as side car) are also running. #3151

Closed susmitasain closed 3 years ago

susmitasain commented 3 years ago

Expected Behavior

API call should enforce OPA implicitly using envoy filter.

Actual Behavior

The call is not going to OPA so rules are not being enforced .

My detailed configurations - https://stackoverflow.com/questions/66183588/opa-running-as-host-level-separate-service-policies-are-not-getting-enforced

anderseknert commented 3 years ago

Hi @susmitasain !

What do the logs from the envoy proxy say? Given how the requests don't seem to reach OPA that's where I'd start looking.

ashutosh-narkar commented 3 years ago

Continuing the discussion from this slack thread. @susmitasain based on the decision logs you've shared, it looks like OPA is not getting any requests from Envoy. The 403 Forbidden is generated by default if there is a network error between the filter and the OPA-Envoy plugin. You should change the default value of the status_on_error field in the Envoy external authz config to help uncover potential network issues between Envoy and the OPA-Envoy plugin.

susmitasain commented 3 years ago

Hi @susmitasain !

What do the logs from the envoy proxy say? Given how the requests don't seem to reach OPA that's where I'd start looking.

Below logs from envoy I can see -

[root@node-1-227 opa_as_service]# kubectl logs deployment/example-app -c envoy [2021-02-15 06:45:43.182][1][info][main] [source/server/server.cc:205] initializing epoch 0 (hot restart version=10.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363 size=2654312) [2021-02-15 06:45:43.182][1][info][main] [source/server/server.cc:207] statically linked extensions: [2021-02-15 06:45:43.182][1][info][main] [source/server/server.cc:209] access_loggers: envoy.file_access_log,envoy.http_grpc_access_log [2021-02-15 06:45:43.182][1][info][main] [source/server/server.cc:212] filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.grpc_http1_reverse_bridge,envoy.filters.http.header_to_metadata,envoy.filters.http.jwt_authn,envoy.filters.http.rbac,envoy.filters.http.tap,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash [2021-02-15 06:45:43.182][1][info][main] [source/server/server.cc:215] filters.listener: envoy.listener.original_dst,envoy.listener.original_src,envoy.listener.proxy_protocol,envoy.listener.tls_inspector [2021-02-15 06:45:43.182][1][info][main] [source/server/server.cc:218] filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.filters.network.dubbo_proxy,envoy.filters.network.mysql_proxy,envoy.filters.network.rbac,envoy.filters.network.sni_cluster,envoy.filters.network.thrift_proxy,envoy.filters.network.zookeeper_proxy,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy [2021-02-15 06:45:43.182][1][info][main] [source/server/server.cc:220] stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.stat_sinks.hystrix,envoy.statsd [2021-02-15 06:45:43.182][1][info][main] [source/server/server.cc:222] tracers: envoy.dynamic.ot,envoy.lightstep,envoy.tracers.datadog,envoy.zipkin [2021-02-15 06:45:43.182][1][info][main] [source/server/server.cc:225] transport_sockets.downstream: envoy.transport_sockets.alts,envoy.transport_sockets.tap,raw_buffer,tls [2021-02-15 06:45:43.182][1][info][main] [source/server/server.cc:228] transport_sockets.upstream: envoy.transport_sockets.alts,envoy.transport_sockets.tap,raw_buffer,tls [2021-02-15 06:45:43.182][1][info][main] [source/server/server.cc:234] buffer implementation: old (libevent) [2021-02-15 06:45:43.186][1][info][main] [source/server/server.cc:281] admin address: 0.0.0.0:8001 [2021-02-15 06:45:43.187][1][info][config] [source/server/configuration_impl.cc:50] loading 0 static secret(s) [2021-02-15 06:45:43.187][1][info][config] [source/server/configuration_impl.cc:56] loading 1 cluster(s) [2021-02-15 06:45:43.188][1][info][upstream] [source/common/upstream/cluster_manager_impl.cc:137] cm init: all clusters initialized [2021-02-15 06:45:43.188][1][info][config] [source/server/configuration_impl.cc:60] loading 1 listener(s) [2021-02-15 06:45:43.190][1][info][config] [source/server/configuration_impl.cc:85] loading tracing configuration [2021-02-15 06:45:43.190][1][info][config] [source/server/configuration_impl.cc:105] loading stats sink configuration [2021-02-15 06:45:43.190][1][info][main] [source/server/server.cc:462] all clusters initialized. initializing init manager [2021-02-15 06:45:43.190][1][info][config] [source/server/listener_manager_impl.cc:1006] all dependencies initialized. starting workers [2021-02-15 06:45:43.191][1][info][main] [source/server/server.cc:478] starting main dispatch loop [2021-02-15 07:00:43.192][1][info][main] [source/server/drain_manager_impl.cc:63] shutting down parent after drain

ashutosh-narkar commented 3 years ago

The Envoy logs should have some logs related to the requests it receives as well which I'm not seeing. I would try to first check if the request path works as expected w/o OPA in the path. Then add OPA and update the status_on_error field value as mentioned before.

susmitasain commented 3 years ago

The Envoy logs should have some logs related to the requests it receives as well which I'm not seeing. I would try to first check if the request path works as expected w/o OPA in the path. Then add OPA and update the status_on_error field value as mentioned before.

I appreciate your guidance on this. Will update soon

susmitasain commented 3 years ago

Envoy logs without OPA - using failure_mode_allow: false in envoy config- (I tested the requests are passing through envoy using failure_mode_allow: true when there is no OPA)

[2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:256] initializing epoch 0 (hot restart version=11.104) [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:258] statically linked extensions: [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.clusters: envoy.cluster.eds, envoy.cluster.logical_dns, envoy.cluster.original_dst, envoy.cluster.static, envoy.cluster.strict_dns, envoy.clusters.aggregate, envoy.clusters.dynamic_forward_proxy, envoy.clusters.redis [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.filters.udp_listener: envoy.filters.udp.dns_filter, envoy.filters.udp_listener.udp_proxy [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.resource_monitors: envoy.resource_monitors.fixed_heap, envoy.resource_monitors.injected_resource [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.transport_sockets.upstream: envoy.transport_sockets.alts, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.tap, envoy.transport_sockets.tls, raw_buffer, tls [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.tracers: envoy.dynamic.ot, envoy.lightstep, envoy.tracers.datadog, envoy.tracers.dynamic_ot, envoy.tracers.lightstep, envoy.tracers.opencensus, envoy.tracers.xray, envoy.tracers.zipkin, envoy.zipkin [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.dubbo_proxy.filters: envoy.filters.dubbo.router [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.dubbo_proxy.route_matchers: default [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.transport_sockets.downstream: envoy.transport_sockets.alts, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.tap, envoy.transport_sockets.tls, raw_buffer, tls [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.dubbo_proxy.serializers: dubbo.hessian2 [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.thrift_proxy.transports: auto, framed, header, unframed [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.filters.listener: envoy.filters.listener.http_inspector, envoy.filters.listener.original_dst, envoy.filters.listener.original_src, envoy.filters.listener.proxy_protocol, envoy.filters.listener.tls_inspector, envoy.listener.http_inspector, envoy.listener.original_dst, envoy.listener.original_src, envoy.listener.proxy_protocol, envoy.listener.tls_inspector [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.filters.http: envoy.buffer, envoy.cors, envoy.csrf, envoy.ext_authz, envoy.fault, envoy.filters.http.adaptive_concurrency, envoy.filters.http.aws_lambda, envoy.filters.http.aws_request_signing, envoy.filters.http.buffer, envoy.filters.http.cache, envoy.filters.http.cors, envoy.filters.http.csrf, envoy.filters.http.dynamic_forward_proxy, envoy.filters.http.dynamo, envoy.filters.http.ext_authz, envoy.filters.http.fault, envoy.filters.http.grpc_http1_bridge, envoy.filters.http.grpc_http1_reverse_bridge, envoy.filters.http.grpc_json_transcoder, envoy.filters.http.grpc_stats, envoy.filters.http.grpc_web, envoy.filters.http.gzip, envoy.filters.http.header_to_metadata, envoy.filters.http.health_check, envoy.filters.http.ip_tagging, envoy.filters.http.jwt_authn, envoy.filters.http.lua, envoy.filters.http.on_demand, envoy.filters.http.original_src, envoy.filters.http.ratelimit, envoy.filters.http.rbac, envoy.filters.http.router, envoy.filters.http.squash, envoy.filters.http.tap, envoy.grpc_http1_bridge, envoy.grpc_json_transcoder, envoy.grpc_web, envoy.gzip, envoy.health_check, envoy.http_dynamo_filter, envoy.ip_tagging, envoy.lua, envoy.rate_limit, envoy.router, envoy.squash [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.retry_priorities: envoy.retry_priorities.previous_priorities [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.udp_listeners: raw_udp_listener [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.resolvers: envoy.ip [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.thrift_proxy.filters: envoy.filters.thrift.rate_limit, envoy.filters.thrift.router [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.filters.network: envoy.client_ssl_auth, envoy.echo, envoy.ext_authz, envoy.filters.network.client_ssl_auth, envoy.filters.network.direct_response, envoy.filters.network.dubbo_proxy, envoy.filters.network.echo, envoy.filters.network.ext_authz, envoy.filters.network.http_connection_manager, envoy.filters.network.kafka_broker, envoy.filters.network.local_ratelimit, envoy.filters.network.mongo_proxy, envoy.filters.network.mysql_proxy, envoy.filters.network.ratelimit, envoy.filters.network.rbac, envoy.filters.network.redis_proxy, envoy.filters.network.sni_cluster, envoy.filters.network.tcp_proxy, envoy.filters.network.thrift_proxy, envoy.filters.network.zookeeper_proxy, envoy.http_connection_manager, envoy.mongo_proxy, envoy.ratelimit, envoy.redis_proxy, envoy.tcp_proxy [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.retry_host_predicates: envoy.retry_host_predicates.omit_canary_hosts, envoy.retry_host_predicates.omit_host_metadata, envoy.retry_host_predicates.previous_hosts [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] http_cache_factory: envoy.extensions.http.cache.simple [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.health_checkers: envoy.health_checkers.redis [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.grpc_credentials: envoy.grpc_credentials.aws_iam, envoy.grpc_credentials.default, envoy.grpc_credentials.file_based_metadata [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.dubbo_proxy.protocols: dubbo [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.stats_sinks: envoy.dog_statsd, envoy.metrics_service, envoy.stat_sinks.dog_statsd, envoy.stat_sinks.hystrix, envoy.stat_sinks.metrics_service, envoy.stat_sinks.statsd, envoy.statsd [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.thrift_proxy.protocols: auto, binary, binary/non-strict, compact, twitter [2021-02-15 19:45:38.905][1][info][main] [source/server/server.cc:260] envoy.access_loggers: envoy.access_loggers.file, envoy.access_loggers.http_grpc, envoy.access_loggers.tcp_grpc, envoy.file_access_log, envoy.http_grpc_access_log, envoy.tcp_grpc_access_log [2021-02-15 19:45:38.913][1][warning][misc] [source/common/protobuf/utility.cc:198] Using deprecated option 'envoy.api.v2.listener.Filter.config' from file listener_components.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details. [2021-02-15 19:45:38.913][1][info][main] [source/server/server.cc:341] admin address: 0.0.0.0:8001 [2021-02-15 19:45:38.915][1][info][main] [source/server/server.cc:469] runtime: layers:

ashutosh-narkar commented 3 years ago

When you set failure_mode_allow: true the client request are allowed even if the communication with the authorization service has failed, or if the authorization service has returned a HTTP 5xx error.

You'll need to set status_on_error to say 400 or something to determine if Envoy can reach OPA in the first place.

susmitasain commented 3 years ago

Case 1: failure_mode_allow: true - getting response back from the application without honoring OPA policies . OPA -There is no decision log specifically for the calls made.

Case 2: failure_mode_allow: false and status_on_error: BadRequest - No response back from the application , No decision log in OPA . Below is sample response -

HTTP/1.1 400 Bad Request date: Mon, 15 Feb 2021 20:47:18 GMT server: envoy content-length: 0

My observation- irrespective of doing anything in envoy , it is not honoring OPA at all when OPA is running as stand alone service . I have checked OPA service explicitly by calling rest API , it is running fine and prints in decision logs. I am doubtful about the service discovery for OPA will not work using mentioned envoy plugins as per below documentations - https://www.openpolicyagent.org/docs/latest/envoy-authorization/ .

Few more details about my local configuration if that helps . My standalone OPA service has been exposed through node port in k8s cluster .

example-app-service NodePort 10.102.220.48 8080:31817/TCP 45m kubernetes ClusterIP 10.96.0.1 443/TCP 1.7d opa NodePort 10.110.60.207 8181:31522/TCP 16m

my envoy configuration for service discovery -

http_filters:

Any help will be appreciated . Thank you.

ashutosh-narkar commented 3 years ago

My observation- irrespective of doing anything in envoy , it is not honoring OPA at all when OPA is running as stand alone service . I have checked OPA service explicitly by calling rest API , it is running fine and prints in decision logs. I am doubtful about the service discovery for OPA will not work using mentioned envoy plugins as per below documentations.

So the issue seems to be that Envoy cannot reach OPA. Why don't you try setting the IP in the target_uri and seeing if that helps to debug further ? Also why are you not running OPA as a sidecar ?

susmitasain commented 3 years ago

My observation- irrespective of doing anything in envoy , it is not honoring OPA at all when OPA is running as stand alone service . I have checked OPA service explicitly by calling rest API , it is running fine and prints in decision logs. I am doubtful about the service discovery for OPA will not work using mentioned envoy plugins as per below documentations.

So the issue seems to be that Envoy cannot reach OPA. Why don't you try setting the IP in the target_uri and seeing if that helps to debug further ? Also why are you not running OPA as a sidecar ?

Yes , Envoy can not reach to OPA using same configuration as described in the documentation (https://www.openpolicyagent.org/docs/latest/envoy-authorization/) when OPA is running as service instead of sidecar.

I have tried setting the "target_uri" to actual host IP which does not have any impact .

In our scenario OPA rule engine along with policy data might grows a lot. Using side car approach, OPA sidecar size will be continuously growing which in turn increases overall POD size, also every OPA sidecar has to be in sync while refreshing the policy data from bundles.

My initial thoughts are to have an OPAgent as a Daemonset on each node so that all pods on that node can share the OPA instance, something like this.

image

But following the same , I am able to call OPA (as service) explicitly but it fails when envoy calls implicitly. I wonder whether I need to use any other service discovery mechanism or not.

ashutosh-narkar commented 3 years ago

For your use-case, using the same configuration as described in the documentation is not going to work as is. I don't have enough information about how the networking works in a pod for the container to reach something external. You want to check with k8s slack on that. I know that Istio has a ServiceEntry CRD that allows you to update the service registry, so that maybe something to look at.

Regarding OPA growing in size, you mentioned you have plenty of data ? How much data are talking about ? There are ways in which OPA can pull data on-the-fly during policy evaluation from your external service which can then send only the subset of data that's needed to make that decision. See this link to explore those options.

susmitasain commented 3 years ago

Thanks for your suggestions . I am trying to understand the drawback of having OPA as sidecar in each and every pod ? For example when OPA rule engine grows with highly dynamic policy data, OPA sidecar size will also increase which eventually increases the overall POD size . isn't it considered as an additional burden ? Also in a real production environment having 1000 or more pods ,how to handle HA, replications and redundancy for each OPA sidecar .

ashutosh-narkar commented 3 years ago

Typically in the microservice authz use-case where latency requirements are pretty strict, evaluating policies locally via the sidecar model avoids introducing a network hop in order to perform the authorization check. The tends to add lesser latency in the request path compared to running OPA outside the pod, hence it's better from the perspective of performance and availability.

Regarding your question about 1000s of pods, the recommended way to do policy and data distribution is via Bundles. It's an eventually consistent model. OPA can optionally persist activated bundles to disk for recovery purposes.

How much data are we talking about per OPA ? Also it would be helpful to know your requirements around HA.

srenatus commented 3 years ago

Hmm pretty silent. Looks like this is done, or no longer of interest...? I'll close it, feel free to re-open or create another issue if I'm mistaken.

susmitasain commented 3 years ago

Hi ,

Sorry for the late reply .

I was working on the same still now but you can close it .

Thanks

On Thu, Aug 12, 2021 at 2:22 AM Stephan Renatus @.***> wrote:

Closed #3151 https://github.com/open-policy-agent/opa/issues/3151.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-policy-agent/opa/issues/3151#event-5147949878, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKMAI3OZDNGL6RA4TZHLOLTT4OHE5ANCNFSM4XR7TEEQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

-- Thanks Susmita Sain