Closed qcattez closed 2 days ago
👋 @qcattez, the behavior you are seeing is due to the usage of nilToZero: true
in your configuration. Removing it or explicitly setting it to false (the default is false) should fix your issue.
What's happening is that CloudWatch continues to export resources in ListMetrics
for a variable length of time even when they have no data points. When the exporter calls GetMetricData
and no data point is found in the time window nilToZero
dictates if a zero should be produced or a nil. Producing zero gives you continuity in your graph panels while nil leaves gaps much like cloudwatch.
Hi @kgeckhart 👋
Thanks for helping out ;)
I tried setting nilToZero: false
but it doesn't change the behavior I mentionned : there is still an extra sample with the same value (not zero).
Furthermore, as we can see in the screenshots and graphs, even if I had nilToZero: true
, I didn't get a continuity in my graph :/
Do you have another idea ?
Just commenting to get some help on this 🙏
You say you want continuity in your graphs and for that to happen you need consistent sampling or prometheus will mark the value as stale. Nils can be connected through settings if you are using grafana. Continuity cannot be achieved if the sample is dropped which is more important?
I think there's a misunderstanding : my problem is about a wrong extra sample, not about continuity. I annotated the screenshots to be more explicit.
Here are the samples from Cloudwatch :
And here are the samples retrieved by YACE and presented in Grafana :
We see that the data is wrong with a sample that shouldn't exist.
nilToZero
has no impact on this behavior.
Can someone reproduce it ? Is it expected looking at my configuration ?
For anyone stumbling upon the same issue, I finally had my cloudwatch metrics without errors.
Updated fields in my values.yaml
:
extraArgs:
scraping-interval: 60
serviceMonitor:
enabled: true
interval: 300s
config: |-
apiVersion: v1alpha1
sts-region: us-west-2
discovery:
exportedTagsOnMetrics:
AWS/Firehose:
- Environment
- Platform
jobs:
- type: AWS/Firehose
regions:
- us-west-2
delay: 600
period: 300
length: 300
nilToZero: true
statistics:
- Sum
metrics:
- name: DeliveryToRedshift.Records
- name: DeliveryToRedshift.Bytes
- name: DeliveryToRedshift.Success
- name: DeliveryToS3.Records
- name: DeliveryToS3.Bytes
- name: DeliveryToS3.Success
A scraping-interval
of 60
instead of 300
removed the wrong extra samples at the end.
A length
of 300
was enough.
addCloudwatchTimestamp: true
was introducing wrong samples in the middle of the timeserie.
nilToZero: true
now performs as expected.
Is there an existing issue for this?
YACE version
v0.61.2
Config file
Current Behavior
When configured with a period of
5m
and a matchingscraping-interval
, the last sample of a metric is duplicated byYACE
.Here is the Cloudwatch graph :
And here is the Grafana graph :
Querying the metric with the AWS CLI shows that no extra samples are present (the duplicated value is
143170.0
on the Grafana graph):Note : 1 extra sample is added with a period of
5m
. When the period is1m
, 4 extra samples are generated.Expected Behavior
No extra samples are generated.
Steps To Reproduce
We deploy
YACE
with kustomize :kustomization.yaml
:values.yaml
:To generate the manifests :
kustomize build . --enable-helm
Don't mind the
serviceAccount.create: false
, we create it elsewhere with theIRSA
annotation.Anything else?
Hope someone can see help 🙏