open-policy-agent / opa

Open Policy Agent (OPA) is an open source, general-purpose policy engine.
https://www.openpolicyagent.org
Apache License 2.0
9.49k stars 1.32k forks source link

Status API does not return metadata added in .manifest file of the OPA bundle. #6871

Open apatade-tl opened 1 month ago

apatade-tl commented 1 month ago

Short description

Status API does not return metadata added in .manifest file of the OPA bundle.

Steps To Reproduce

Discussed in https://github.com/orgs/open-policy-agent/discussions/613

Originally posted by **apatade-tl** July 13, 2024 Hi team, I am trying to add metadata via the .manifest file of the bundle along with other files like data.json, evaluator.rego and policy rego files. Only revision from the .manifest file is returned as the active_revision in the status API's result.bundles.example. **Is there any configuration which is required so that the metadata content from the .manifest file would get returned?** Command used to run the server: > opa run --server --set bundles.example.resource=bundle.tar.gz --set services.example.url=http://localhost:8080/ --set status.service=example .manifest file containing metadata: ``` { "revision": "v1.0.0", "metadata": { "environment": "production", "timestamp": "2024-07-12T12:34:56Z" } } ``` Status API returns: ``` { "labels": { "id": "b0466ccb-5976-4a09-a755-667f7da44e4f", "version": "0.66.0" }, "bundles": { "example": { "name": "example", "active_revision": "v1.0.0", "last_successful_activation": "2024-07-13T13:29:18.351839Z", "type": "snapshot", "size": 5258, "last_successful_download": "2024-07-13T13:29:18.328496Z", "last_successful_request": "2024-07-13T13:29:18.328496Z", "last_request": "2024-07-13T13:29:18.328496Z", "metrics": { "timer_bundle_request_ns": 189486000, "timer_rego_load_bundles_ns": 34545583, "timer_rego_module_compile_ns": 22106083, "timer_rego_module_parse_ns": 9814789 } } }, "metrics": {...}, "plugins": { "bundle": { "state": "OK" }, "discovery": { "state": "OK" }, "status": { "state": "OK" } } } ```

Expected behavior

metadata content from the .manifest file of the OPA bundle should be returned in the status API response.

Additional context

https://www.openpolicyagent.org/docs/latest/management-bundles/#bundle-file-format

charlieegan3 commented 1 month ago

Thanks for opening the issue, please do add some more context that you shared in the discussion forum post to fill out this feature request 🙂

As discussed here: https://github.com/orgs/open-policy-agent/discussions/613#discussioncomment-10050356

apatade-tl commented 1 month ago

Use case: We have a single pod that is responsible for the authorization and listens to the status and bundle load API's of OPA containers/servers of multiple (100s) pods (specific to multiple applications). We do not want to jump into the logic of the address resolutions of OPA containers from the different application's pods to hit /v1/data/system API explicitly to get metadata. Instead/hence, we are looking for any way to get the metadata (any custom data) via status API, as like revision is automatically returned as active_revision.

Background: We were looking for a way to return some metadata (any custom data) included inside the bundle as part of the status API. Found that it is only achievable via custom plugin or calling APIs explicitly and no mechanism/configuration already present with base OPA.

Proposal: A mechanism/configuration that would allow metadata (any custom data) present into the bundle (as part of .manifest file or so) to be returned as part of status API. As the status API automatically reports status of any bundle state changes to the external server/API as configured, it would be really helpful for the external server to work/process/act based on the metadata received from the status API. Just to mention/underline again from the use case that this external server would be working/processing for 100s or more OPA servers status APIs.

Note: If it is okay, we can directly include .manifest file's existing metadata attribute to be returned in status API OR can define a new attribute for that purpose.

ashutosh-narkar commented 1 month ago

Thanks for providing the context @apatade-tl. The purpose of the bundle status is to provide details about bundle processing eg. when it was downloaded etc. The bundle revision which is picked up from the manifest acts like an identifier for the bundle. Including the metadata in the manifest in the status API does not seem appropriate as it's not really providing any information about the processing of the bundle. Plus if we include the metadata in the status update why not other manifest fields. Also if there was any sensitive information in the metadata, we don't have a mechanism for masking it. It's true we can get this info explicitly via the API which can be locked if needed.

I think the manifest can be made part of the bundle data iirc. So you should be able to query that and make it part of the policy decision and have it available in the decision logs. Just a thought. WDYT?

apatade-tl commented 1 month ago

Including the metadata in the manifest in the status API does not seem appropriate as it's not really providing any information about the processing of the bundle.

This would not be by default included in the status API, but would be driven based on configuration, let's say something like --set status.manifest.metadata=true. I believe it does provide information about the bundle's metadata which can be critical for the external server to process further if the bundle state changes.

Plus if we include the metadata in the status update why not other manifest fields.

  1. I believe one can include the details (any required data) to be received in return, inside the .manifest file's metadata field as it is a JSON and can hold anything. OR
  2. We can define specific flags (as like status.manifest.metadata) to be set for returning any desired fields the user wants in return.

Also if there was any sensitive information in the metadata, we don't have a mechanism for masking it. It's true we can get this info explicitly via the API which can be locked if needed.

Again, this would be based on the configuration, and if already configured to be true then user would already be in consent/looking for having/requiring this in status API.

I think the manifest can be made part of the bundle data iirc. So you should be able to query that and make it part of the policy decision and have it available in the decision logs. Just a thought. WDYT?

Bundle data is again an explicit call that needs to be made, for which need to resolve the different OPA server's host addresses. Status API would be better which can drive based on configuration (to control only what needs to be returned) and also as it is automatically triggered on any bundle state change which reports to the pre configured external service which listens to this status API/report and acts/process accordingly.

@charlieegan3 @srenatus @tsandall I have provided the use case and proposal with configurations/answers that I can think of on spot for this. This seems to be easy to implement and very useful requirement, can you please go through this and we/team can brainstorm on this if required. Thanks! 🙂

charlieegan3 commented 1 month ago

Hmm, I hadn't considered the security implications for existing users of where metadata may have been (mis?)used to contain confidential info.

It sounds like the reason it 'needs' to be the status API here is because of the push nature of how this functionality operates. I wonder if there is still something we can do to make this generic? Perhaps we could have the status API also push the results of an optional configured query? If the user configured this, it'd be included in another, optional status field and could be used here, while still being generic enough to have use for other use cases too? Just thinking aloud here, perhaps this an abuse of the status service API... but 'policy-defined status' sounds like an appealing feature to have.

One consideration might be to not allow http.send or other functions that could impact the timing of the pings... there are likely others too.

ashutosh-narkar commented 1 month ago

Bundle data is again an explicit call that needs to be made, for which need to resolve the different OPA server's host addresses.

This metadata info could be made part of the policy decision and whenever OPA is queried decision logs are automatically uploaded. So you should have the required info w/o the need for any address resolution.

apatade-tl commented 1 month ago

OPA querying and decision logs would NOT be automatic/implicit based on the bundle state change. Only status API is triggered based on the bundle state change and can be configured to hit any external service. Hence status API having a configurable mechanism to return metadata is desirable to solve this and also would be lot beneficial to be leveraged under OPA.

PS - We did look at all the alternatives before opening this issue and request you/team to look into the details provided/discussed as most of the things/scenarios/alternatives are already covered and discussed.

ashutosh-narkar commented 1 month ago

Hence status API having a configurable mechanism to return metadata is desirable to solve this and also would be lot beneficial to be leveraged under OPA.

Can you please elaborate on the benefits this would provide? From the discussion so far, it feels like this would help with your specific use-case. I don't yet see why the metadata should be part of the Status API as I mentioned before especially when there are alternatives. Others in community can chime in as well with their use-cases and thoughts about this.

tsandall commented 1 month ago

Why not have the service handling Status API messages lookup the bundle or bundle manifest based on the revisions in the status message? It should be relatively easy to keep an index that maps bundle revisions to bundles (or just bundle manifests). Then your service handling those Status API messages has full access to the state that OPA stores and can do whatever it needs to. In the future, if additional state from the bundle is required, it doesn't have to be produced by OPA (which could cause performance issues) and doesn't require a rollout of config changes to your OPAs. At a high-level, I'd just recommend implementing this use case in your control plane.

stale[bot] commented 2 weeks ago

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.