grafana / oncall

Developer-friendly incident response with brilliant Slack integration
GNU Affero General Public License v3.0
3.32k stars 262 forks source link

Grafana Oncall plugin in error after some time "unknown error" #3761

Open gaetanars opened 5 months ago

gaetanars commented 5 months ago

What went wrong?

What happened:

What did you expect to happen:

How do we reproduce it?

  1. Deploy Redix + PostgreSQL
  2. Deploy Oncall with Hem chart
  3. Deploy Grafana with Oncall Plugin and configure them with app provisioning
  4. After a night, the plugin won't work and says :
    An unknown error occurred when trying to install the plugin. Verify OnCall API URL, http://oncall-engine.grafana.svc:8080, is correct?
    Refresh your page and try again, or try removing your plugin configuration and reconfiguring.
  5. The oncall-engine log is :
    Traceback (most recent call last):                                                                                
    File "/usr/local/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner             
     response = get_response(request)                                                                              
                ^^^^^^^^^^^^^^^^^^^^^                                                                              
    File "/usr/local/lib/python3.11/site-packages/django/core/handlers/base.py", line 197, in _get_response         
     response = wrapped_callback(request, *callback_args, **callback_kwargs)                                       
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                       
    File "/usr/local/lib/python3.11/site-packages/django/views/decorators/csrf.py", line 56, in wrapper_view        
     return view_func(*args, **kwargs)                                                                             
            ^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                             
    File "/usr/local/lib/python3.11/site-packages/rest_framework/viewsets.py", line 125, in view                    
     return self.dispatch(request, *args, **kwargs)                                                                
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                
    File "/usr/local/lib/python3.11/site-packages/rest_framework/views.py", line 509, in dispatch                   
     response = self.handle_exception(exc)                                                                         
                ^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                         
    File "/usr/local/lib/python3.11/site-packages/rest_framework/views.py", line 469, in handle_exception           
     self.raise_uncaught_exception(exc)                                                                            
    File "/usr/local/lib/python3.11/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception   
     raise exc                                                                                                     
    File "/usr/local/lib/python3.11/site-packages/rest_framework/views.py", line 497, in dispatch                   
     self.initial(request, *args, **kwargs)                                                                        
    File "/usr/local/lib/python3.11/site-packages/rest_framework/views.py", line 414, in initial                    
     self.perform_authentication(request)                                                                          
    File "/usr/local/lib/python3.11/site-packages/rest_framework/views.py", line 324, in perform_authentication     
     request.user                                                                                                  
    File "/usr/local/lib/python3.11/site-packages/rest_framework/request.py", line 227, in user                     
     self._authenticate()                                                                                          
    File "/usr/local/lib/python3.11/site-packages/rest_framework/request.py", line 380, in _authenticate            
     user_auth_tuple = authenticator.authenticate(self)                                                            
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                            
    File "/etc/app/apps/auth_token/auth.py", line 77, in authenticate                                               
     return self.authenticate_credentials(token_string, request)                                                   
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                   
    File "/etc/app/apps/auth_token/auth.py", line 93, in authenticate_credentials                                   
     auth_token = check_token(token_string, context=context)                                                       
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                       
    File "/etc/app/apps/grafana_plugin/helpers/gcom.py", line 96, in check_token                                    
     return PluginAuthToken.validate_token_string(token_string, context=context)                                   
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                   
    File "/etc/app/apps/auth_token/models/plugin_auth_token.py", line 51, in validate_token_string                  
     stack_id = int(context["stack_id"])                                                                           
                ^^^^^^^^^^^^^^^^^^^^^^^^                                                                           
    ValueError: invalid literal for int() with base 10: '%!f(<nil>)'

Grafana OnCall Version

1.3.91

Product Area

Auth

Grafana OnCall Platform?

Kubernetes

User's Browser?

Brave v1.62.153

Anything else to add?

Thank you for your help

gaetanars commented 5 months ago

I've found the root-cause. My plugins-provisioning file doesn't contain the stackId: 5 value. After Grafana restarting, the Oncall plugin fails.

gaetanars commented 4 months ago

After some days the problem is still present. It appears that at every restart the plugin provisioning run and Oncall fail with the same message :

An unknown error occurred when trying to install the plugin. Verify OnCall API URL, http://oncall-engine.grafana.svc:8080, is correct?
Refresh your page and try again, or try removing your plugin configuration and reconfiguring.

My plugin provisioning file :

---
apiVersion: 1
apps:
  - type: grafana-oncall-app
    jsonData:
      orgId: 1
      stackId: 5
      onCallApiUrl: http://oncall-engine.grafana.svc:8080
javierSanchez5 commented 3 months ago

Did you found the solution I am trying to configure oncall via provisioning I am getting the same error and in the logs of engine the request response 403, but if i configure the oncall setting the backend url in the configure interface it works

lc-guy commented 3 months ago

Also getting this issue. The example provisioning file present here is bogus. The frontend seems to make some API calls to get a token when registering the oncall-engine URL (going in secureJsonData), which the backend doesn't do, and so it errors out.

It seems even the main oncall kubernetes chart doesn't do this URL provisioning. Would it be possible to add an official way to enable provisioning for this plugin?