solo-io / wasm

Web Assembly tools and SDKs for extending cloud-native infrastructure
Apache License 2.0
305 stars 39 forks source link

Invalid path: /var/local/lib/wasme-cache when deploying on Istio #158

Open djannot opened 4 years ago

djannot commented 4 years ago

I've followed this guide: https://docs.solo.io/web-assembly-hub/latest/tutorial_code/deploy_tutorials/deploying_with_istio/

And was able to deploy on my cluster running Istio 1.6.7, but then I got this error on all the Pod from the istio-proxy container:

2020-08-14T08:23:31.404984Z warning envoy config    [external/envoy/source/common/config/grpc_subscription_impl.cc:101] gRPC config for type.googleapis.com/envoy.api.v2.Listener rejected: Error adding/updating listener(s) virtualInbound: Invalid path: /var/local/lib/wasme-cache/a515a5d244b021c753f2e36c744e03a109cff6f5988e34714dbe725c904fa917

2020-08-14T08:23:32.807385Z warn    Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected
2020-08-14T08:23:34.770722Z warn    Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected

The same filter works when I deploy it on Gloo.

Sodman commented 4 years ago

I think I know what the issue is here. We re-wrote our CI/CD release pipeline for the 0.0.24 release and it looks like the VERSION didn't get picked up by the build, so it defaulted to dev. As such, the Istio Operator image is pulling in dev instead of 0.0.24. I'll cut a new release to fix the version issue.

pantianying commented 4 years ago

hello 0.0.25 still has this problem image

Sodman commented 4 years ago

Yes, this should be fixed by #159

Sodman commented 4 years ago

~Still seeing this issue after this change~

Sodman commented 4 years ago

False alarm, this is indeed fixed by the 0.0.26 release!

GuangTianLi commented 4 years ago

hello 0.0.26 still has this problem.

image

[Envoy (Epoch 0)] [2020-08-25 03:34:34.117][23][warning][config][external/envoy/source/common/config/grpc_subscription_impl.cc:87] gRPC config for type.googleapis.com/envoy.api.v2.Listener rejected: Error adding/updating listener(s) 172.22.3.210_8000: Failed to initialize WASM code from /var/local/lib/wasme-cache/3f319eec32afdfb1c053e1aea3a665504ff9d5f5ea4019146bcb455dfaea29d1
virtualInbound: Failed to initialize WASM code from /var/local/lib/wasme-cache/3f319eec32afdfb1c053e1aea3a665504ff9d5f5ea4019146bcb455dfaea29d1
Sodman commented 4 years ago

Hi @GuangTianLi, It looks like your issue is different. The original issue here was complaining about an invalid path, which was ultimately caused by the wrong version of the operator being loaded.

It looks like in your error message the WASM code is failing to initialize, but it's not complaining about invalid paths.

pantianying commented 4 years ago

i am sorry , my 0.0.26 version still has this problem, can someone tell me why?

`2020-08-31T02:55:52.212398Z info Envoy command: [-c etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster details.istio-project --service-node sidecar~10.129.5.186~details-v1-5f8447ccd5-7ggl8.istio-project~istio-project.svc.cluster.local --max-obj-name-len 189 --local-address-ip-version v4 --log-format %Y-%m-%dT%T.%fZ %l envoy %n %v -l warning --component-log-level misc:error --concurrency 2] 2020-08-31T02:55:52.615490Z warning envoy config [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamAggregatedResources gRPC config stream closed: 14, no healthy upstream 2020-08-31T02:55:52.615558Z warning envoy config [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:54] Unable to establish new stream 2020-08-31T02:55:52.642499Z warning envoy main [external/envoy/source/server/server.cc:475] there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections 2020-08-31T02:55:52.748311Z info sds resource:default new connection 2020-08-31T02:55:52.748396Z info sds Skipping waiting for ingress gateway secret 2020-08-31T02:55:53.554728Z warning envoy config [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:92] StreamAggregatedResources gRPC config stream closed: 14, no healthy upstream 2020-08-31T02:55:53.554766Z warning envoy config [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:54] Unable to establish new stream 2020-08-31T02:55:53.563283Z info cache Root cert has changed, start rotating root cert for SDS clients 2020-08-31T02:55:53.563336Z info cache GenerateSecret default 2020-08-31T02:55:53.570992Z info sds resource:default pushed key/cert pair to proxy 2020-08-31T02:55:56.997075Z info sds resource:ROOTCA new connection 2020-08-31T02:55:56.997173Z info sds Skipping waiting for ingress gateway secret 2020-08-31T02:55:56.997203Z info cache Loaded root cert from certificate ROOTCA 2020-08-31T02:55:56.997280Z info sds resource:ROOTCA pushed root cert to proxy 2020-08-31T02:55:57.425220Z warning envoy config [external/envoy/source/common/config/grpc_subscription_impl.cc:101] gRPC config for type.googleapis.com/envoy.api.v2.Listener rejected: Error adding/updating listener(s) virtualInbound: Invalid path: /var/local/lib/wasme-cache/a515a5d244b021c753f2e36c744e03a109cff6f5988e34714dbe725c904fa917

2020-08-31T02:55:59.287257Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected 2020-08-31T02:56:01.233766Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected 2020-08-31T02:56:03.187032Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected 2020-08-31T02:56:05.176523Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected 2020-08-31T02:56:07.194144Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected 2020-08-31T02:56:09.192120Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected 2020-08-31T02:56:11.176369Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected 2020-08-31T02:56:13.176285Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected 2020-08-31T02:56:15.176349Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected 2020-08-31T02:56:17.176047Z warn Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected`

image

ilackarms commented 4 years ago

hi @pantianying @djannot

this issue was actually resolved in. https://github.com/solo-io/wasme/pull/95 but we have had some CI issues that are blocking it from getting merged, and at this point the PR needs to be updated. cc @yuval-k

a temporary workaround is to restart the target pods; envoy should eventually pick up the wasm module file

yuval-k commented 4 years ago

i believe the CI issues are related to an envoy bug that was only recently fixed. i.e. the approach in the PR might only work for the next istio release

pantianying commented 4 years ago

hi @pantianying @djannot

this issue was actually resolved in. #95 but we have had some CI issues that are blocking it from getting merged, and at this point the PR needs to be updated. cc @yuval-k

a temporary workaround is to restart the target pods; envoy should eventually pick up the wasm module file

restart the target pods? when I use

wasme deploy istio webassemblyhub.io/pantianying/add-header:v0.0.3

And then the new POD couldn't init successfully, you mean that restart the pod that couldn't init successfully can solve this issue?

Sodman commented 3 years ago

@pantianying Yes, for now restarting the pod should fix it. Unfortunately like @yuval-k mentioned we're waiting for Istio to pull in the upstream envoy fix for the issue which ultimately causes this cache race condition.

harpratap commented 3 years ago

@Sodman It's still failing for me warning envoy config gRPC config for type.googleapis.com/envoy.config.listener.v3.Listener rejected: Error adding/updating listener(s) virtualInbound: Invalid path: /var/local/lib/wasme-cache/314c75ded0da28314381281e74ab8b91196055360bd7b57f132de21c2116b9a3

And then my pod crashes PostStartHookError: command 'pilot-agent wait' exited with 255: Error: timeout waiting for Envoy proxy to become ready. Last error: HTTP status code 503

Wasme version 0.0.32 Istio version 1.8.2 Kubernetes 1.18.6

I am able to make it work on kind in my local though, but doesn't work on our on-prem cluster. So I am not sure where to start debugging this

Sodman commented 3 years ago

Hi @harpratap, do the logs from the wasme pod (which manages the cache), give any more insight? If the cache didn't pull the image correctly (could be an HTTP error) it's possible it never cached it, which would explain why it didn't get loaded into the proxy. If this is the case, you could try bouncing the cache pod to force a refresh.

tanjunchen commented 3 years ago

I guess that it can be resolved by delete the po in wasme namespace. laughing