Open TJM opened 3 years ago
I like the idea of having the Dashboard as part of the chart too!
Mind opening a pull request so I can take a look? I think hardcoding the min/max values is suboptimal since it would require a change of the source code if docker decides to make a change to the quota. If you open a PR I'll check if we can find a way to use the RateLimit-Limit
from the http response for the purpose of automatically deciding the max value.
We made a similar observation, although we didn't encounter the "falling of a cliff" phenomenon quite as often as you do. My first though was that this phenomenon might be caused by a change in the IP address which is used to query the limit, but actually our system should have a static IP address and we did encounter the phenomenon too. So maybe it's just dockerhub?
Might also be that we have a bug in our script, but I think this is unlikely because it just takes the values from the http Header of the HEAD request and exposes those values. We don't do any calculations here.
Another interesting phenomenon is that the HEAD request sometimes does not return any information regarding the Rate Limit. This leads to a drop to zero followed by the correct data point in the graph. (Happened twice between 14:00 and 16:00 n your graph).
I am guessing that the odd zeros are probably just a bug where dockerhub's code is timing out calculating/collecting, so its returning a default value (0)... It might be interesting to try to catch the debug output when an odd zero, maybe they are returning an error too, and we could catch/retry or just not "export" that odd value... We don't have to do anything, as we are simply reporting the value we received from docker, but it might make the graph smoother (if that is desirable) :)
For the "cliff" issue, I am guessing that they are pushing out a new version of their code that calculates image pulls and maybe don't have stateful storage configured?
I have not done the helmchart part yet, I am just releasing it with flux directly right now... which basically involves a yaml file like:
apiVersion: v1
kind: ConfigMap
metadata:
labels:
grafana_dashboard: "1"
name: dockerhub-rate-limits
namespace: monitoring
data:
dockerhub-rate-limits.json: |-
{JSON HERE}
I will attach the JSON directly (had to gzip it for github), so we can collaborate on the min/max... I don't see any issue with hardcoding the min to 0, but the max will have to be template driven (values.yaml) unless we can come up with a way to calculate it (without switching to percent) Docker Hub Rate Limits-1607535713313.json.gz
strangely, I tried removing the MAX and it is fine now? I wonder if the problem was that I didn't have enough data yet to determine the auto-determined max?
I haven't found the time yet to look at the dashboard json, but it sounds reasonable that grafana can choose a fitting max value automatically? So I would suggest not specifying a max value and letting grafana work its magic?
Regarding the "odd zeros" I think there is no error code returned in the http request, the actual values for the metrics are just missing. At least it was like that when I was looking into it a couple of weeks ago. I am not sure if it's desirable to smoothen out the curve, I think it's better to simple reflect the truth as mandated by the dockerhub api and leaving the interpretation of the data to the user looking at the chart.
Regarding the "cliff issue" I think your assumption makes a lot of sense.
Hmm, just thought of something, it almost seems like they are returning 100 available (and 100 max) based on the results? From what I can tell in the code, the default values would be 0, 0. We could probably safely just not publish a 0,0 result, it would make more sense to keep the previous value or not return anything (leave a gap) than to return 0,0, but in this case, it is returning 100,100 (I think).
Anyhow... since we can make this dashboard template driven, I am thinking make setting the max value optional. That way, if someone wants to hard-code it, they can set it in values.yaml.
I experience the same behavior even with a simple bash script and curl:
Jan 17 12:38:02 influx-grafana dockerhub[12552]: DockerHub ratelimit: limit=100,remaining=66
Jan 17 12:38:04 influx-grafana dockerhub[12556]: Done.
Jan 17 12:39:03 influx-grafana dockerhub[12576]: DockerHub ratelimit: limit=100,remaining=67
Jan 17 12:39:05 influx-grafana dockerhub[12580]: Done.
Jan 17 12:40:02 influx-grafana dockerhub[12643]: DockerHub ratelimit: limit=100,remaining=100
Jan 17 12:40:04 influx-grafana dockerhub[12658]: Done.
Jan 17 12:41:02 influx-grafana dockerhub[12741]: DockerHub ratelimit: limit=100,remaining=67
Jan 17 12:41:04 influx-grafana dockerhub[12745]: Done.
Jan 17 12:42:02 influx-grafana dockerhub[12765]: DockerHub ratelimit: limit=100,remaining=67
Jan 17 12:42:04 influx-grafana dockerhub[12769]: Done.
Jan 17 12:43:02 influx-grafana dockerhub[12789]: DockerHub ratelimit: limit=100,remaining=67
Jan 17 12:43:04 influx-grafana dockerhub[12793]: Done.
Jan 17 12:44:03 influx-grafana dockerhub[12813]: DockerHub ratelimit: limit=100,remaining=67
Jan 17 12:44:05 influx-grafana dockerhub[12817]: Done.
Jan 17 12:45:03 influx-grafana dockerhub[12837]: DockerHub ratelimit: limit=100,remaining=67
Jan 17 12:45:05 influx-grafana dockerhub[12841]: Done.
Jan 17 12:46:02 influx-grafana dockerhub[12862]: DockerHub ratelimit: limit=100,remaining=100
Jan 17 12:46:04 influx-grafana dockerhub[12866]: Done.
Jan 17 12:47:03 influx-grafana dockerhub[12886]: DockerHub ratelimit: limit=100,remaining=67
Jan 17 12:47:05 influx-grafana dockerhub[12890]: Done.
Jan 17 12:48:02 influx-grafana dockerhub[12910]: DockerHub ratelimit: limit=100,remaining=67
Jan 17 12:48:04 influx-grafana dockerhub[12914]: Done.
Jan 17 12:49:02 influx-grafana dockerhub[12934]: DockerHub ratelimit: limit=100,remaining=67
Jan 17 12:49:04 influx-grafana dockerhub[12938]: Done.
Jan 17 12:50:03 influx-grafana dockerhub[13008]: DockerHub ratelimit: limit=100,remaining=68
Jan 17 12:50:05 influx-grafana dockerhub[13020]: Done.
Jan 17 12:51:02 influx-grafana dockerhub[13102]: DockerHub ratelimit: limit=100,remaining=68
Jan 17 12:51:04 influx-grafana dockerhub[13106]: Done.
Jan 17 12:52:02 influx-grafana dockerhub[13126]: DockerHub ratelimit: limit=100,remaining=68
Jan 17 12:52:04 influx-grafana dockerhub[13130]: Done.
Jan 17 12:53:03 influx-grafana dockerhub[13150]: DockerHub ratelimit: limit=100,remaining=68
Jan 17 12:53:05 influx-grafana dockerhub[13154]: Done.
Jan 17 12:54:03 influx-grafana dockerhub[13174]: DockerHub ratelimit: limit=100,remaining=68
Jan 17 12:54:05 influx-grafana dockerhub[13178]: Done.
Jan 17 12:55:03 influx-grafana dockerhub[13198]: DockerHub ratelimit: limit=100,remaining=68
Jan 17 12:55:05 influx-grafana dockerhub[13202]: Done.
Jan 17 12:56:02 influx-grafana dockerhub[13223]: DockerHub ratelimit: limit=100,remaining=68
Jan 17 12:56:04 influx-grafana dockerhub[13227]: Done.
Jan 17 12:57:02 influx-grafana dockerhub[13247]: DockerHub ratelimit: limit=100,remaining=100
Jan 17 12:57:04 influx-grafana dockerhub[13251]: Done.
Jan 17 12:58:02 influx-grafana dockerhub[13271]: DockerHub ratelimit: limit=100,remaining=68
Jan 17 12:58:04 influx-grafana dockerhub[13275]: Done.
Jan 17 12:59:02 influx-grafana dockerhub[13295]: DockerHub ratelimit: limit=100,remaining=68
Jan 17 12:59:04 influx-grafana dockerhub[13299]: Done.
Jan 17 13:00:03 influx-grafana dockerhub[13331]: DockerHub ratelimit: limit=100,remaining=69
Jan 17 13:00:05 influx-grafana dockerhub[13363]: Done.
For sure, the odd 100 reading seems like it is "their" problem, not ours. The question is, do we just report the statistics as gathered, or "filter" the odd result out?
HOWEVER, this issue was actually about adding the grafana dashboard ;)
I opened a separate issue for the "odd 100" problem. :-)
Is there something I can do to help go forward with the Grafana dashboard here?
The original blog post has a dashboard linked. I am thinking about adding that as an option to the helm chart. However, in order to "maintain" a reasonable gauge, I had to hardcode the min/max values. It doesn't look like there is a way to calculate or use the MAX based on a query, without changing over to "percent" (which you can hardcode).
I have a couple enhancements to mine that I would be happy to share as well (fixed thresholds on Pulls Remaining for example).
Also, has anyone else noticed that the graphs fall off like a cliff several times a day instead of having a "rolling" 6 hour period? Is that a bug in our collection or in "dockerhub" ? (thoughts?)