Open languitar opened 4 years ago
Is it completely random? What graph are you trying to render? how long does the query take ?
I couldn't spot a pattern so far :/ It happened for different alerts defined on different dashboards. Queries are usually pretty simple.
Maybe this is of interest: we use Victoria Metrics as the backend. However, this was already the case when we were still on Grafana 6 and there rendering worked without any issues.
I just had a look at the reported error messages and can confirm that various plots on different dashboards are affected:
❯ cat ~/Downloads/log.err | sed 's/.*\(http.*\)/\1/' | cut -d "\\" -f 1 | sort
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2F09cVU_9Wk%2Fdatabase-consistency-alerting%3ForgId%3D1%26panelId%3D10%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2F09cVU_9Wk%2Fdatabase-consistency-alerting%3ForgId%3D1%26panelId%3D10%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2F09cVU_9Wk%2Fdatabase-consistency-alerting%3ForgId%3D1%26panelId%3D10%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2F09cVU_9Wk%2Fdatabase-consistency-alerting%3ForgId%3D1%26panelId%3D10%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2F09cVU_9Wk%2Fdatabase-consistency-alerting%3ForgId%3D1%26panelId%3D11%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2F09cVU_9Wk%2Fdatabase-consistency-alerting%3ForgId%3D1%26panelId%3D11%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2F7bOegv2Zz%2Fairflow%3ForgId%3D1%26panelId%3D4%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2F7bOegv2Zz%2Fairflow%3ForgId%3D1%26panelId%3D4%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FCkNLSPrZz%2Fgdprlog-service%3ForgId%3D1%26panelId%3D4%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FCkNLSPrZz%2Fgdprlog-service%3ForgId%3D1%26panelId%3D4%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FL4WNBvtZz%2Flog-alerting%3ForgId%3D1%26panelId%3D2%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FL4WNBvtZz%2Flog-alerting%3ForgId%3D1%26panelId%3D2%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FL4WNBvtZz%2Flog-alerting%3ForgId%3D1%26panelId%3D2%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FL4WNBvtZz%2Flog-alerting%3ForgId%3D1%26panelId%3D2%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FL4WNBvtZz%2Flog-alerting%3ForgId%3D1%26panelId%3D2%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FL4WNBvtZz%2Flog-alerting%3ForgId%3D1%26panelId%3D2%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FL4WNBvtZz%2Flog-alerting%3ForgId%3D1%26panelId%3D2%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FL4WNBvtZz%2Flog-alerting%3ForgId%3D1%26panelId%3D2%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FL4WNBvtZz%2Flog-alerting%3ForgId%3D1%26panelId%3D2%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FL4WNBvtZz%2Flog-alerting%3ForgId%3D1%26panelId%3D2%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FL4WNBvtZz%2Flog-alerting%3ForgId%3D1%26panelId%3D2%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FL4WNBvtZz%2Flog-alerting%3ForgId%3D1%26panelId%3D2%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FL4WNBvtZz%2Flog-alerting%3ForgId%3D1%26panelId%3D2%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FL4WNBvtZz%2Flog-alerting%3ForgId%3D1%26panelId%3D2%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FQOfuckAZk%2Fapi-events%3ForgId%3D1%26panelId%3D10%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FQOfuckAZk%2Fapi-events%3ForgId%3D1%26panelId%3D10%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FQOfuckAZk%2Fapi-events%3ForgId%3D1%26panelId%3D10%26render%3D1&width=1000
http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2FQOfuckAZk%2Fapi-events%3ForgId%3D1%26panelId%3D10%26render%3D1&width=1000
We're getting a similar error with Grafana v7 and image-renderer:latest running as docker containers in a k8s pod. Hitting the image render url in the browser works, but fails via curl.
Curl:
curl --http2 -v -H "Authorization: Bearer <API_KEY>" --silent -o image.png https://grafana.my.url/render/d-solo/4Uf2RVgZzA/overview?orgId=1&refresh=30s&from=1593547285834&to=1593558085834&panelId=3&width=1000&height=500&tz=America%2FLos_Angeles
Resulting logs from the containers:
renderer {"url":"/render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=400&renderKey=7D19N8pAxkj0a7A5l9u2sxkjVlmR97Ge&timeout=60&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2F4Uf2RVgZzA%2Fapi-gateway-overview%3ForgId%3D1%26render%3D1&width=800","stack":"TimeoutError: waiting for function failed: timeout 60000ms exceeded\n at new WaitTask (/usr/src/app/node_modules/puppeteer/lib/DOMWorld.js:549:28)\n at DOMWorld.waitForFunction (/usr/src/app/node_modules/puppeteer/lib/DOMWorld.js:454:12)\n at Frame.waitForFunction (/usr/src/app/node_modules/puppeteer/lib/FrameManager.js:657:28)\n at Page.waitForFunction (/usr/src/app/node_modules/puppeteer/lib/Page.js:1144:29)\n at Browser.<anonymous> (/usr/src/app/build/browser/browser.js:174:24)\n at Generator.next (<anonymous>)\n at fulfilled (/usr/src/app/build/browser/browser.js:5:58)\n at runMicrotasks (<anonymous>)\n at processTicksAndRejections (internal/process/task_queues.js:97:5)","level":"error","message":"Requestfailed"}
grafana t=2020-07-01T00:08:39+0000 lvl=eror msg="Failed to send request to remote rendering service." logger=rendering renderer=http error="Get \"http://localhost:8081/render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=400&renderKey=bikU7J4IfyIJrPbHsPFDM4zUMKOLY637&timeout=60&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2F4Uf2RVgZzA%2Fapi-gateway-overview%3ForgId%3D1%26render%3D1&width=800\": context deadline exceeded"
grafana t=2020-07-01T00:08:39+0000 lvl=eror msg="Rendering failed." logger=context userId=0 orgId=1 uname= error="Failed to send request to remote rendering service. Get \"http://localhost:8081/render?deviceScaleFactor=1.000000&domain=localhost&encoding=&height=400&renderKey=bikU7J4IfyIJrPbHsPFDM4zUMKOLY637&timeout=60&timezone=&url=http%3A%2F%2Flocalhost%3A3000%2Fd-solo%2F4Uf2RVgZzA%2Fapi-gateway-overview%3ForgId%3D1%26render%3D1&width=800\": context deadline exceeded"
grafana t=2020-07-01T00:08:39+0000 lvl=eror msg="Request Completed" logger=context userId=0 orgId=1 uname= method=GET path=/render/d-solo/4Uf2RVgZzA/api-gateway-overview status=500 remote_addr=10.0.14.97 time_ms=62027 size=1723 referer=
For reference it seems to fail here, but I don't understand why: https://github.com/grafana/grafana-image-renderer/blob/ce1f81438e5f69c7fd7c73ce08bab624c4c92e25/src/service/http-server.ts#L49
Suggests enabling debug/verbose logs, see https://github.com/grafana/grafana-image-renderer/blob/master/docs/remote_rendering_using_docker.md#environment-variables
f using environment variables, use:
LOG_LEVEL=debug
RENDERING_VERBOSE_LOGGING=true
RENDERING_DUMPIO=true
You could as an alternative mount a configuration file in the container, for instructions see https://github.com/grafana/grafana-image-renderer/blob/master/docs/remote_rendering_using_docker.md#configuration-file
Is this still an issue with the latest version of the plugin? If so, could you provide detailed logs by enabling the options mentioned above?
I'm seeing the same issue on grafana/grafana-image-renderer:latest
(image renderer version 3.4.2) deployed via Grafana Helm Chart version 6.29.5.
Environment variables:
HTTP_HOST="0.0.0.0"
HTTP_PORT="8081"
LOG_LEVEL="debug"
RENDERING_ARGS="--no-sandbox,--disable-gpu,--window-size=1280x758"
RENDERING_DUMPIO="true"
RENDERING_MODE="clustered"
RENDERING_VERBOSE_LOGGING="true"
Log output:
{"filePath":"/tmp/10156b63.png","level":"debug","message":"Taking screenshot"}
{"level":"error","message":"Request failed","stack":"Error: Request aborted\n at onaborted (/usr/src/app/node_modules/express/lib/response.js:1030:15)\n at Immediate._onImmediate (/usr/src/app/node_modules/express/lib/response.js:1072:9)\n at processImmediate (internal/timers.js:464:21)","url":"/render?deviceScaleFactor=1.000000&domain=grafana.monitoring&encoding=&height=500&renderKey=<REDACTED>&timeout=15&timezone=&url=<REDACTED>"}
TypeError: Cannot read property 'statusCode' of undefined
at /usr/src/app/build/service/http-server.js:135:42
at Layer.handle_error (/usr/src/app/node_modules/express/lib/router/layer.js:71:5)
at trim_prefix (/usr/src/app/node_modules/express/lib/router/index.js:321:13)
at /usr/src/app/node_modules/express/lib/router/index.js:284:7
at Function.process_params (/usr/src/app/node_modules/express/lib/router/index.js:341:12)
at next (/usr/src/app/node_modules/express/lib/router/index.js:275:10)
at next (/usr/src/app/node_modules/express/lib/router/route.js:127:14)
at /usr/src/app/build/service/http-server.js:44:21
at /usr/src/app/node_modules/express/lib/response.js:439:22
at onaborted (/usr/src/app/node_modules/express/lib/response.js:1032:5)
Same issue with grafana 8.5.15 and renderer ( running in separate pod in k8s) v3.4.2
What happened:
We have Grafana 7 running on EKS with the dockerized remote rendering service deployed for rendering alert images. Sometimes, instead of the correct plot, only a default test image is posted to Slack and error messages are emitted by Grafana and the renderer.
What you expected to happen:
Images are always rendered correctly
How to reproduce it (as minimally and precisely as possible):
Unfortunately, I couldn't find a way to reproduce this in a limited test case.
Anything else we need to know?:
The remote rendering container is deployed using the method I have described in this issue comment. Thus, renderer and grafana share the same network namespace and can communicate directly using
localhost
. No caching involved.When rendering fails, the remote renderer emits the following log messages:
And Grafana reports the following:
Environment:
Grafana version: 7.0.3
Data source type & version: Prometheus + Victoria metrics
OS Grafana is installed on: Docker container running inside EKS
User OS & Browser: not relevant for remote rendering
Grafana plugins:
Others: