grafana / oncall

Developer-friendly incident response with brilliant Slack integration
GNU Affero General Public License v3.0
3.48k stars 285 forks source link

OnCall can't reach Grafana #5197

Open sirvincent opened 1 day ago

sirvincent commented 1 day ago

What went wrong?

We run the OSS of Grafana on Ubuntu 22.04 installed via sudo apt install grafana (Version 11.2.2+security-01). We follow the OnCall installation guide for hobby docker compose environment (without grafana).

What happened:

The troubleshooting curl commands in the README return the same error.

The grafana OnCall plugin screen shows connection from grafana to oncall (v1.11.3, OpenSource): Image Side note: The version here is v1.11.3 but in top right the plugin version is v.1.6.2, why? We do not see an update button as the README suggest.

But the corresponding docker compose log from engineshows a 404 error that it can't access a location. The interesting snippet from the log (I have obfuscated IP with ):

engine_1               | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.023862333997385576 status=200 method=HEAD url=http://<IP>:3000/api/org slow=0 
engine_1               | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=apps.grafana_plugin.helpers.client Error connecting to api instance 404 Client Error: Not Found for url: http://<IP>:3000/api/access-control/users/permissions/search?actionPrefix=grafana-oncall-app
engine_1               | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.013972447981359437 status=404 method=HEAD url=http://<IP>:3000/api/access-control/users/permissions/search?actionPrefix=grafana-oncall-app slow=0 
engine_1               | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.013520489999791607 status=200 method=HEAD url=http://<IP>:3000/api/org slow=0 
engine_1               | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=apps.grafana_plugin.helpers.client Error connecting to api instance 404 Client Error: Not Found for url: http://<IP>:3000/api/access-control/users/permissions/search?actionPrefix=grafana-oncall-app
engine_1               | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.006794679997256026 status=404 method=HEAD url=http://<IP>:3000/api/access-control/users/permissions/search?actionPrefix=grafana-oncall-app slow=0 
engine_1               | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=apps.grafana_plugin.helpers.client Error connecting to api instance 404 Client Error: Not Found for url: http://<IP>:3000/api/plugins/grafana-incident-app/settings
engine_1               | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.006319462001556531 status=404 method=GET url=http://<IP>:3000/api/plugins/grafana-incident-app/settings slow=0 
engine_1               | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=apps.grafana_plugin.helpers.client Error connecting to api instance 404 Client Error: Not Found for url: http://<IP>:3000/api/plugins/grafana-labels-app/settings
engine_1               | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.012679944018600509 status=404 method=GET url=http://<IP>:3000/api/plugins/grafana-labels-app/settings slow=0 
engine_1               | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=apps.grafana_plugin.helpers.client Error connecting to api instance 404 Client Error: Not Found for url: http://<IP>:3000/api/plugins/grafana-irm-app/settings
engine_1               | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.012594404979608953 status=404 method=GET url=http://<IP>:3000/api/plugins/grafana-irm-app/settings slow=0 
engine_1               | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=apps.user_management.sync RBAC status org=1 rbac_enabled=False
engine_1               | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.006195642024977133 status=200 method=HEAD url=http://<IP>:3000/api/org slow=0 
engine_1               | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.023239836998982355 status=200 method=GET url=http://<IP>:3000/api/org/users slow=0 
engine_1               | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.017058505007298663 status=200 method=GET url=http://<IP>:3000/api/teams/search?perpage=1000000 slow=0 
engine_1               | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.010913970996625721 status=200 method=GET url=http://<IP>:3000/api/teams/1/members slow=0 
engine_1               | 2024-10-20 14:18:51 source=engine:app google_trace_id=none logger=root outbound latency=0.014364856004249305 status=200 method=GET url=http://<IP>:3000/api/teams/2/members slow=0 

What did you expect to happen:

We have turned on externalServiceAccounts in the grafana.ini as follows:

enable = externalServiceAccounts

The issue seems similar to 1035 but the replies there didn't help us.

How do we reproduce it?

Follow README hobby installation guide without having grafana as part of docker, a local OSS grafana (installed via sudo apt install grafana is used.

Grafana OnCall Version

v1.11.3 OpenSource

Product Area

Helm/Kubernetes/Docker

Grafana OnCall Platform?

Docker

User's Browser?

No response

Anything else to add?

No response

sirvincent commented 1 day ago

We have managed to solve most of our problems. We think that due to that in the past we have installed an older version of OnCall (v1.6.2) but didn't update within grafana the plugin. As mentioned in the issue update button isn't visible, even though I am on an admin account (not the admin account), we updated the plugin via the command line interface:

grafana-cli plugins update-all

After which we followed the steps in the README and managed to get it mostly working. However when setting-up an integration with grafana alerting we obtain an error message: "Failed to update AlertManager Config"

Looking at the docker compose log from engine shows:

engine_1               | 2024-10-20 20:45:30 source=engine:app google_trace_id=none logger=apps.grafana_plugin.helpers.client Error connecting to api instance 400 Client Error: Bad Request for url: http://<IP>:3000/api/alertmanager/grafana/config/api/v1/alerts
engine_1               | 2024-10-20 20:45:30 source=engine:app google_trace_id=none logger=root outbound latency=0.007630477019120008 status=400 method=POST url=http://<IP>:3000/api/alertmanager/grafana/config/api/v1/alerts slow=0 
engine_1               | 2024-10-20 20:45:30 source=engine:app google_trace_id=none logger=apps.alerts.grafana_alerting_sync_manager.grafana_alerting_sync GrafanaAlertingSyncManager: Failed to update contact point (POST) for is_grafana_datasource True; response: {'url': 'http://<IP>:3000/api/alertmanager/grafana/config/api/v1/alerts', 'connected': False, 'status_code': 400, 'message': '400 Client Error: Bad Request for url: http://<IP>:3000/api/alertmanager/grafana/config/api/v1/alerts'}

We are not on the admin account but an account with admin privileges, or isn't that the same?

I do not know if this belongs to this issue or a separate one needs to be made.

bpedersen2 commented 23 hours ago

See also #5183 , #5100, #4960, #4829

bpedersen2 commented 22 hours ago

With the latest releases I now get a new , but similar go error in grafana:

In the logs I see the following related error:

logger=plugin.grafana-oncall-app t=2024-10-21T11:18:25.293508052Z level=error msg="Error getting user" error="failed to parse JSON response: json: cannot unmarshal object into Go value of type []plugin.OnCallPermission body={\"message\":\"Not found\"}\n"
Mon, Oct 21 2024 1:18:25 pm
logger=plugin.grafana-oncall-app t=2024-10-21T11:18:25.293526415Z level=error msg="Error validating oncall plugin settings" error="error setting up request headers: failed to parse JSON response: json: cannot unmarshal object into Go value of type []plugin.OnCallPermission body={\"message\":\"Not found\"}\n "
bpedersen2 commented 22 hours ago

and in oncall i see the following error:

2024-10-21 11:30:02 source=engine:app google_trace_id=none logger=apps.auth_token.auth auth request user not found - missing valid X-Grafana-Context
Mon, Oct 21 2024 1:30:02 pm