proxy-wasm / proxy-wasm-cpp-host

WebAssembly for Proxies (C++ host implementation)
Apache License 2.0
83 stars 69 forks source link

route cache control #421

Open wbpcode opened 1 week ago

wbpcode commented 1 week ago

In the Envoy, the route cache will be cleared by default if a header is modified by the wasm filter. It would be better to provide a way to control the behavior by the wasm filter self.

But seems this require the proxy wasm to provide a new ABI?

PiotrSikora commented 1 week ago

We had proxy_clear_route_cache call in the v0.1.0, but it was removed in v0.2.0 because it was very Envoy specific and error-prone, since each change to HTTP headers had to be followed by a call to proxy_clear_route_cache, and if the plugin author missed it, then it would break Envoy's routing logic (which I believe could result in security issues because of misrouted requests).

This functionality has been replaced by automatic clearing of that cache when HTTP headers are updated.

What would work better with the manual control? Which use cases don't work with the automatic clearing?

wbpcode commented 1 week ago

@PiotrSikora I personally think automatic clearing is not a good choice in practice.

We may change the host, path, headers in a filter because various reasons (for example, rewrite the path according to custom filter configuration). But we only actually want to refresh the route in limited scenarios.

Automatic clearing actually prohibit the host/path updating in the filter because it may result in unexpected 404.

The correct way is to clear the route cache only when it's required explicitly. (Most envoy filters work as this way, except wasm and lua now.)

PiotrSikora commented 1 week ago

The correct way is to clear the route cache only when it's required explicitly. (Most envoy filters work as this way, except wasm and lua now.)

And what happens when the route cache isn't cleared when it should (either by mistake or intentionally by a malicious plugin)?

wbpcode commented 1 week ago

The correct way is to clear the route cache only when it's required explicitly. (Most envoy filters work as this way, except wasm and lua now.)

And what happens when the route cache isn't cleared when it should (either by mistake or intentionally by a malicious plugin)?

As I said before, the route cache should only be cleared when the filter require it explicitly. In all other cases, the initial matched route will be used.

Basically, we only use this feature when we cannot select the correct route according to client requests and need some additional inputs from the filter (like user info from auth filters.)

leonm1 commented 1 week ago

I'm trying to understand the feature request in here.

Do plugin authors want users to intentionally route a request to a host/path that does not resolve in the Envoy's route configuration?

For example,

wbpcode commented 1 week ago

@leonm1 yeah, you are right 👍

leonm1 commented 3 days ago

I went looking for prior art, and I also found a few related (non-wasm) Envoy issues:

It seems as though at least making this configurable is important.

I also took a look at how Apache Traffic Server handles this. It appears like ATS supports both:

Setting safe defaults is important. As Piotr points out, changing the host without refreshing the cached route can be dangerous in some circumstances. It sounds like refreshing the route cache by default may be a safe default for plugin authors, but we could allow plugin authors to opt-in to refreshing the cached route. This would apply to both ATS and Envoy, and I assume similar behavior applies to other proxies as well (though I did not check).

ext_proc also has an enum which can be set on a per-request basis.

@PiotrSikora given the use cases and prior art above, what do you think about adding a knob to let plugin authors choose?

@wbpcode does this satisfy the feature request?

wbpcode commented 3 days ago

Basically, I think keep the same default behavior with envoy is good enough. But at least a control flag is better than no control anyway.

PiotrSikora commented 3 days ago

@leonm1 thanks for digging into this! However, I think that we should step back a bit, and ask what's the intended use case, instead of focusing on Envoy's implementation details.

Is the feature request to: 1) modify HTTP request headers without affecting selected upstream cluster (generic feature), 2) be able to select different upstream cluster (again, generic feature), 3) something else that's only achievable by re-picking selected route?

Also, regarding route cache in Envoy, the official documentation says that it must be cleared when changes to headers would affect routing, and since plugin authors generally don't (and shouldn't) know all the possible configurations the plugin is going to be deployed to, they cannot know whether the route cache should be cleared or not. See https://github.com/envoyproxy/envoy/blob/f86d1294e764af4429732d322c8a8a9fedead8a2/envoy/http/filter.h#L361-L365:

clearRouteCache() - Clears the route cache for the current request. This must be called when a filter has modified the headers in a way that would affect routing.

IMHO, this decision should be done in proxy's configuration and not in the plugin itself (unless we want to make plugins deployment-specific).

Notably, https://github.com/envoyproxy/data-plane-api/pull/569 added match_incoming_request_route to gRPC/JSON transcoder filter to disable automatic clearing of the route cache, so to keep things consistent within Envoy, that's probably the easiest and best way to address this.

wbpcode commented 2 days ago

I will try to remove the automatic refresh in the Envoy's implementation and add the foreign functions to clear the route cache.

By this way, the wasm plugin could get similar logic with the Envoy's native extension. The plugin could refreh the route according to it's plugin configuration or code.

PiotrSikora commented 2 days ago

I will try to remove the automatic refresh in the Envoy's implementation and add the foreign functions to clear the route cache.

By this way, the wasm plugin could get similar logic with the Envoy's native extension. The plugin could refreh the route according to it's plugin configuration or code.

Erm, this is the opposite of what I suggested.

Also, please remember that: 1) Changing defaults will break existing plugins. 2) As mentioned before, there was proxy_clear_route_cache in Proxy-Wasm ABI v0.1.0, but based on the feedback from Istio plugin developers (@kyessenov?), we removed it in ABI v0.2.0 and made this behavior automatic. It was 5 years ago, so I don't recall the details now, but we shouldn't be flipping back-and-forth based on edge cases that can be addressed differently. 3) IMHO, it should be proxy operators, and not plugins authors, that control whether this mechanism can be bypassed.

kyessenov commented 2 days ago

@PiotrSikora automatic on/off for clearing route cache will not work for selective re-routing, e.g. as it happens with JWT authentication filter that may or may not impact routing via dynamic metadata. So that means we do need an explicit host call. That can co-exist with the plugin-level config to turn on/off routing automatically. So in the end a solution probably needs:

wbpcode commented 2 days ago

If we hope the wasm extension has similar experience with the native one, then the framework shouldn't provide any automatical refresh. It should only be decided by the plugin configuration and the plugin logic.

Basically, the new current work basically is additional fee of the 5 years agos decision. I think this is similar to the stop iteration support.

wbpcode commented 1 day ago

Is it possible to bring the proxy_clear_route_cache back? And we can also add the stop iteration support back. All these changes could be placed in a new ABI version?

Or proxy-wasm community now treat it as a envoy-specifc feature so we cannot bring it back?

PiotrSikora commented 1 day ago

If we hope the wasm extension has similar experience with the native one, then the framework shouldn't provide any automatical refresh.

Wasm isn't a native extension, but an external one (same as Lua and ext_proc).

If you look at those two, then Lua always clears route cache when HTTP headers are modified (same as Wasm), and ext_proc is clearing it by default (again, same as Wasm), but that behavior can be disabled in xDS configuration.

At the very least, you should try to match that behavior (i.e. what @kyessenov described above), instead of introducing a completely different one.

It should only be decided by the plugin configuration and the plugin logic.

Again, this can bypass security constrains on various routes, so this should not be something controlled only by the possibly untrusted plugin.

But to address some use cases, I think we can agree that this behavior could be disabled in xDS configuration.

Is it possible to bring the proxy_clear_route_cache back? [...] Or proxy-wasm community now treat it as a envoy-specifc feature so we cannot bring it back?

This is very much Envoy-specific feature and should be treated as such, so it should be added as a custom hostcall and then we can add wrappers for it to various SDKs. The good news about it is that it doesn't require a new ABI version.

And we can also add the stop iteration support back. All these changes could be placed in a new ABI version?

This will be addressed by the new ABI version (see: https://github.com/proxy-wasm/spec/issues/63).

wbpcode commented 1 day ago

If you look at those two, then Lua always clears route cache when HTTP headers are modified (same as Wasm), and ext_proc is clearing it by default (again, same as Wasm), but that behavior can be disabled in xDS configuration.

I also don't think that lua do the correct thing but we can never change it becasue the lua is stable now. Ext_proc will only clear the route cache when the external plugin require it.

  enum RouteCacheAction {
    // The default behavior is to clear the route cache only when the
    // :ref:`clear_route_cache <envoy_v3_api_field_service.ext_proc.v3.CommonResponse.clear_route_cache>`
    // field is set in an external processor response.
    DEFAULT = 0;

    // Always clear the route cache irrespective of the clear_route_cache bit in
    // the external processor response.
    CLEAR = 1;

    // Do not clear the route cache irrespective of the clear_route_cache bit in
    // the external processor response. Setting to RETAIN is equivalent to set the
    // :ref:`disable_clear_route_cache <envoy_v3_api_field_extensions.filters.http.ext_proc.v3.ExternalProcessor.disable_clear_route_cache>`
    // to true.
    RETAIN = 2;
  }

Again, this can bypass security constrains on various routes, so this should not be something controlled only by the possibly untrusted plugin.

Basically when you load a untrust code, you can hard to prohbit it to do bad thing completely.

Specific to this feature, for the untrust plugin, we can also cannot expect it's modification is trust. Then refresh route based on the untrust modifiction automatically typically won't improve the safety. (So, at least the automatically refresh should be removed, this could be guard by a long-live runtime flag.)

And considering the acutal requirement from the pratice, route refresh is necessary, so, we will need a custom host call to do that. (We will add it as foreign call for now.)

We may also want disable the refresh completely for some plugins in the future. This could be treat as independent feature request of the Envoy.

Anyway, I think we should care the actual requirements first. Because at most time, we run the trust plugin that developed by ourselves. After that, we then could consider to make it safer by adding limit to specific plugin.