istio-ecosystem / authservice

Move OIDC token acquisition out of your app code and into the Istio mesh
Apache License 2.0
217 stars 63 forks source link

Run: Unexpected error: !ok #112

Closed AndrewBabbitt97 closed 3 years ago

AndrewBabbitt97 commented 4 years ago

So we have been experiencing issues with gRPC shutting down in authservice as people authenticate, would result in 403s from envoy as it could no longer communicate to authservice while the actual process continued to run only way to resolve would be to manually delete/recycle the authservice pod. I have added logs with trace logging for authservice (with gRPC tracing) and envoy to the issue.

Commenting out the break on line 137 of async_service_impl.cc fixes the issue (the logs uploaded are using this fix). It looks like we were exiting on a bool that isnt fatal if its false thus causing the server to shut down. Fix

The authservice log is the interesting one with error entries for gRPC as follows:

"Attempt to send initial metadata after stream was closed" "Attempt to send message after stream was closed" "Attempt to send trailing metadata after stream was closed"

However in terms of the browser request, it doesnt even appear to have any issues and people are able to auth without issues with this change.

Looking at the gRPC documentation it looks as if Next() returns a bool that if false you should shut the server down, but doesnt seem to be the case for the ok value.

Bear in mind haven't personally worked with gRPC on any projects and could be totally wrong on this fix, however if this is an acceptable fix i can submit a PR for it.

I am however still curious as to what is going on with the stream closing errors, as I'm not seeing anything in the envoy logs relating to it.

authservice.log envoy.log

cfryanr commented 4 years ago

Thanks for reporting this!

r-kotagudem commented 4 years ago

@cfryanr

Experienced same error twice. Hopefully it gets addressed. Let me know if you need any logs from my end.

[2020-07-14 03:44:53.487] [console] [debug] Check: no matching trigger rule, so allowing request to proceed without any authservice functionality ://host.com/web/Servlet/app/RULES_/pbdorj4V2ay*/!TABTHREAD2?pyActivity=pyDeleteDocPg&pzFromFrame=&pzPrimaryPageName=RH_1&pzHarnessID=HID494geMessages=true&AJAXTrackID=1 [2020-07-14 03:44:53.487] [console] [trace] Request processing complete [2020-07-14 03:44:53.487] [console] [error] Run: Unexpected error: !ok [2020-07-14 03:44:53.487] [console] [info] Server shutting down [2020-07-14 03:45:18.700] [console] [info] operator(): Starting periodic cleanup (period of 60 seconds) [2020-07-14 03:45:18.700] [console] [info] DoPeriodicCleanup: removing expired sessions from chain idp_filter_chain [2020-07-14 03:46:18.700] [console] [info] operator(): Starting periodic cleanup (period of 60 seconds) [2020-07-14 03:46:18.700] [console] [info] DoPeriodicCleanup: removing expired sessions from chain idp_filter_chain [2020-07-14 03:47:18.700] [console] [info] operator(): Starting periodic cleanup (period of 60 seconds) [2020-07-14 03:47:18.700] [console] [info] DoPeriodicCleanup: removing expired sessions from chain idp_filter_chain [2020-07-14 03:48:18.700] [console] [info] operator(): Starting periodic cleanup (period of 60 seconds)

r-kotagudem commented 4 years ago

@cfryanr

Configured authservice at ingress gateway level We are frequently seeing this error and had to restart ingress pod. Can we have a solution for this? or workaround untill this is addressed. this is hindering our POC efforts. Also would like to know how best it scales with increased traffic. Thanks for all your support.

AndrewBabbitt97 commented 4 years ago

@r-kotagudem My fix noted in the original issue commenting out that line fixes it, might be able to throw my image we are using in our environment up in the next 24 hours for you

r-kotagudem commented 4 years ago

@r-kotagudem My fix noted in the original issue commenting out that line fixes it, might be able to throw my image we are using in our environment up in the next 24 hours for you

@AndrewBabbitt97 Thank you

r-kotagudem commented 4 years ago

@r-kotagudem My fix noted in the original issue commenting out that line fixes it, might be able to throw my image we are using in our environment up in the next 24 hours for you

@AndrewBabbitt97 rebuilt the image with this change. we are not facing the issue. will keep monitoring.

stefanhenseler commented 4 years ago

@AndrewBabbitt97 @r-kotagudem We are experiencing the same issue from time to time. Did you get any feedback about your fix, and if this is a viable fix? If so, what is the roadmap for the next release of authservice and will you create a PR?

AndrewBabbitt97 commented 4 years ago

@stefanhenseler I haven't gotten feedback on this, however we have been running a version of authservice with this patch in production for quite some time now and it's been stable.

I can definitely go and open a PR for this fix.

stefanhenseler commented 4 years ago

Awesome, thanks for your feedback. I would create a PR and see if it gets merged.

r-kotagudem commented 4 years ago

@AndrewBabbitt97 @r-kotagudem We are experiencing the same issue from time to time. Did you get any feedback about your fix, and if this is a viable fix? If so, what is the roadmap for the next release of authservice and will you create a PR?

No issues after this change suggested by @AndrewBabbitt97. it is stable from long time

AndrewBabbitt97 commented 4 years ago

Just got off a call yesterday with Tetrate to discuss some of this stuff, should hopefully see that PR go through soon.

kfud commented 3 years ago

Devs, can you PLEASE just fix this tiny issue, otherwise this proxy is pretty useless in production because it keeps stopping. Or maybe some tips to build is myself, because that doesn't work either. Thanks!

AndrewBabbitt97 commented 3 years ago

Devs have provided a getting started guide for building it here: https://github.com/istio-ecosystem/authservice/wiki/Setting-up-CLion-on-MacOS-for-Authservice-development

If you can't build your dependencies are probably off. Refer to the makefile for commands. Most notable if you are just trying to build is make docker build I build on Ubuntu 20.04 just fine.

In the end this isn't a proxy, it's a implementation of envoys external auth api. Curious as to if other implementations of this are around.

stefanhenseler commented 3 years ago

@cfryanr who is currently maintaining this project? There are several PRs that have not been merged and released yet. Notably, Redis support.

kfud commented 3 years ago

Devs have provided a getting started guide for building it here: https://github.com/istio-ecosystem/authservice/wiki/Setting-up-CLion-on-MacOS-for-Authservice-development

If you can't build your dependencies are probably off. Refer to the makefile for commands. Most notable if you are just trying to build is make docker build I build on Ubuntu 20.04 just fine.

In the end this isn't a proxy, it's a implementation of envoys external auth api. Curious as to if other implementations of this are around.

make docker-from-scratch FTW!

AndrewBabbitt97 commented 3 years ago

@cfryanr who is currently maintaining this project? There are several PRs that have not been merged and released yet. Notably, Redis support.

Redis support is already merged in, there's just not a release published with it. You can build master to get it

kfud commented 3 years ago

@AndrewBabbitt97 Thanks for the tips and the fix. I was able to build an image from your PR and deploy it. Will see if it helps the problem, thanks!