Closed AndrewBabbitt97 closed 3 years ago
Thanks for reporting this!
@cfryanr
Experienced same error twice. Hopefully it gets addressed. Let me know if you need any logs from my end.
[2020-07-14 03:44:53.487] [console] [debug] Check: no matching trigger rule, so allowing request to proceed without any authservice functionality ://host.com/web/Servlet/app/RULES_/pbdorj4V2ay*/!TABTHREAD2?pyActivity=pyDeleteDocPg&pzFromFrame=&pzPrimaryPageName=RH_1&pzHarnessID=HID494geMessages=true&AJAXTrackID=1 [2020-07-14 03:44:53.487] [console] [trace] Request processing complete [2020-07-14 03:44:53.487] [console] [error] Run: Unexpected error: !ok [2020-07-14 03:44:53.487] [console] [info] Server shutting down [2020-07-14 03:45:18.700] [console] [info] operator(): Starting periodic cleanup (period of 60 seconds) [2020-07-14 03:45:18.700] [console] [info] DoPeriodicCleanup: removing expired sessions from chain idp_filter_chain [2020-07-14 03:46:18.700] [console] [info] operator(): Starting periodic cleanup (period of 60 seconds) [2020-07-14 03:46:18.700] [console] [info] DoPeriodicCleanup: removing expired sessions from chain idp_filter_chain [2020-07-14 03:47:18.700] [console] [info] operator(): Starting periodic cleanup (period of 60 seconds) [2020-07-14 03:47:18.700] [console] [info] DoPeriodicCleanup: removing expired sessions from chain idp_filter_chain [2020-07-14 03:48:18.700] [console] [info] operator(): Starting periodic cleanup (period of 60 seconds)
@cfryanr
Configured authservice at ingress gateway level We are frequently seeing this error and had to restart ingress pod. Can we have a solution for this? or workaround untill this is addressed. this is hindering our POC efforts. Also would like to know how best it scales with increased traffic. Thanks for all your support.
@r-kotagudem My fix noted in the original issue commenting out that line fixes it, might be able to throw my image we are using in our environment up in the next 24 hours for you
@r-kotagudem My fix noted in the original issue commenting out that line fixes it, might be able to throw my image we are using in our environment up in the next 24 hours for you
@AndrewBabbitt97 Thank you
@r-kotagudem My fix noted in the original issue commenting out that line fixes it, might be able to throw my image we are using in our environment up in the next 24 hours for you
@AndrewBabbitt97 rebuilt the image with this change. we are not facing the issue. will keep monitoring.
@AndrewBabbitt97 @r-kotagudem We are experiencing the same issue from time to time. Did you get any feedback about your fix, and if this is a viable fix? If so, what is the roadmap for the next release of authservice and will you create a PR?
@stefanhenseler I haven't gotten feedback on this, however we have been running a version of authservice with this patch in production for quite some time now and it's been stable.
I can definitely go and open a PR for this fix.
Awesome, thanks for your feedback. I would create a PR and see if it gets merged.
@AndrewBabbitt97 @r-kotagudem We are experiencing the same issue from time to time. Did you get any feedback about your fix, and if this is a viable fix? If so, what is the roadmap for the next release of authservice and will you create a PR?
No issues after this change suggested by @AndrewBabbitt97. it is stable from long time
Just got off a call yesterday with Tetrate to discuss some of this stuff, should hopefully see that PR go through soon.
Devs, can you PLEASE just fix this tiny issue, otherwise this proxy is pretty useless in production because it keeps stopping. Or maybe some tips to build is myself, because that doesn't work either. Thanks!
Devs have provided a getting started guide for building it here: https://github.com/istio-ecosystem/authservice/wiki/Setting-up-CLion-on-MacOS-for-Authservice-development
If you can't build your dependencies are probably off. Refer to the makefile for commands. Most notable if you are just trying to build is make docker build
I build on Ubuntu 20.04 just fine.
In the end this isn't a proxy, it's a implementation of envoys external auth api. Curious as to if other implementations of this are around.
@cfryanr who is currently maintaining this project? There are several PRs that have not been merged and released yet. Notably, Redis support.
Devs have provided a getting started guide for building it here: https://github.com/istio-ecosystem/authservice/wiki/Setting-up-CLion-on-MacOS-for-Authservice-development
If you can't build your dependencies are probably off. Refer to the makefile for commands. Most notable if you are just trying to build is
make docker build
I build on Ubuntu 20.04 just fine.In the end this isn't a proxy, it's a implementation of envoys external auth api. Curious as to if other implementations of this are around.
make docker-from-scratch FTW!
@cfryanr who is currently maintaining this project? There are several PRs that have not been merged and released yet. Notably, Redis support.
Redis support is already merged in, there's just not a release published with it. You can build master to get it
@AndrewBabbitt97 Thanks for the tips and the fix. I was able to build an image from your PR and deploy it. Will see if it helps the problem, thanks!
So we have been experiencing issues with gRPC shutting down in authservice as people authenticate, would result in 403s from envoy as it could no longer communicate to authservice while the actual process continued to run only way to resolve would be to manually delete/recycle the authservice pod. I have added logs with trace logging for authservice (with gRPC tracing) and envoy to the issue.
Commenting out the break on line 137 of async_service_impl.cc fixes the issue (the logs uploaded are using this fix). It looks like we were exiting on a bool that isnt fatal if its false thus causing the server to shut down.
The authservice log is the interesting one with error entries for gRPC as follows:
"Attempt to send initial metadata after stream was closed" "Attempt to send message after stream was closed" "Attempt to send trailing metadata after stream was closed"
However in terms of the browser request, it doesnt even appear to have any issues and people are able to auth without issues with this change.
Looking at the gRPC documentation it looks as if Next() returns a bool that if false you should shut the server down, but doesnt seem to be the case for the ok value.
Bear in mind haven't personally worked with gRPC on any projects and could be totally wrong on this fix, however if this is an acceptable fix i can submit a PR for it.
I am however still curious as to what is going on with the stream closing errors, as I'm not seeing anything in the envoy logs relating to it.
authservice.log envoy.log