jimdigriz / freeradius-oauth2-perl

FreeRADIUS OAuth2 (OpenID Connect) using rlm_perl
GNU Affero General Public License v3.0
129 stars 35 forks source link

Segemenation fault while refetching token #28

Open jannoke opened 2 years ago

jannoke commented 2 years ago

It seems there is some kind of issue if token unexpectedly expires.

rlm_perl: oauth2 worker (domain.com): sync users
rlm_perl: oauth2 worker (domain.com): users page
rlm_perl: oauth2 worker (domain.com): sync groups
rlm_perl: oauth2 worker (domain.com): groups page
rlm_perl: oauth2 worker (domain.com): apply
rlm_perl: oauth2 worker (domain.com): syncing in 35 seconds
rlm_perl: oauth2 worker (domain.com): sync
rlm_perl: oauth2 worker (domain.com): sync users
rlm_perl: oauth2 worker (domain.com): users page
rlm_perl: oauth2 worker (domain.com): users failed: 500 read timeout
Thread 4 terminated abnormally: token (domain.com): 500 read timeout at /opt/freeradius-oauth2-perl/main.pm line 179.
rlm_perl: oauth2 worker (domain.com): died, sleeping for 4 seconds
rlm_perl: oauth2 worker (domain.com): started (tid=5)
rlm_perl: oauth2 worker (domain.com): sync
rlm_perl: oauth2 worker (domain.com): sync users
rlm_perl: oauth2 worker (domain.com): users page
rlm_perl: oauth2 worker (domain.com): fetching token
Segmentation fault

Debian 11 , Running freeradius from recommended repo.

jimdigriz commented 2 years ago

Probably something else as an expired token would return 403, a 500 error means the Azure Graph API went-out-to-lunch and that we should retry...rather than explode when retrying.

Thanks for the report, I'll look into it at some stage.

jannoke commented 2 years ago

Just a guess. This has happened before that azure service is restarted and they flush all their side of session and our application started to fail because code checked only on our side if token is expired (by time) not by server response.

jimdigriz commented 2 years ago

The response from Azure is timing out. It means we sent an HTTP request but have not received a response after a long period of time so we abort. This could be because Azure is bust or it could be that the network connectivity between you and Azure is bust. The correct handling of this is to sleep for a while and retry. It looks like we do this, but one retrying we crash for some reason.

It may be that the logic is okay in the code, but I am doing some really dirty things with Perl threads in FR which really best be handled outside of FR; I did not do this as I wanted to make the install easier for the end user and not have them run a separate daemon.

This has nothing to do with the token expiring. If it did, we would have gotten an immediate 403 response.