Open OliverKlette85 opened 2 years ago
Wow thats a hugh amount of SQS queues 🎉
We need more debugging logs to find the error here. Will try to add the debugging in the next seven days.
Alternative provide me A SEPARATE AWS account with 6k SQS already created to debug this:
Cross account sharing via: arn:aws:iam::838758336246:user/debug-yace-489
Don't forget to add the permissions for this user as well:
"tag:GetResources",
"cloudwatch:GetMetricData",
"cloudwatch:GetMetricStatistics",
"cloudwatch:ListMetrics"
Will debug it in the next 7 days.
Hi Thomas, thanks for your quick reaction.
I created a role with the desired permissions and a trust policy for your user in our stg account:
arn:aws:iam::487596255802:role/yace_debug
This account has actually over 12 k SQS queues in the eu-west-1 region :)
Please ping if you need anything from my side.
{"level":"info","msg":"Couldn't describe resources for region eu-west-1: AccessDeniedException: User: arn:aws:sts::487596255802:assumed-role/yace_debug/1638822531663485000 is not authorized to perform: tag:GetResources because no identity-based policy allows the tag:GetResources action\n\tstatus code: 400, request id: f6e26e95-0a6d-4632-b2a6-052425ceeeff\n","time":"2021-12-06T21:28:52+01:00"}
Could you double check on your side that everything is configured correctly? Seems I can switch succesfully into the role but missing the permissions:
Don't forget to add the permissions for this user as well:
"tag:GetResources",
"cloudwatch:GetMetricData",
"cloudwatch:GetMetricStatistics",
"cloudwatch:ListMetrics"
Yeah I set these rights. Maybe the issue was caused by the condition I set on policy level. I moved it to the trust relationship. Could you please test again?
Okay, I see the issue. Now I have something to debug with. Had two small things in mind which were not the issue.
Need to dig deeper.
Greetings, Thomas :)
Thanks for the efforts! Please let me know if you need something from my side.
Grüße aus Berlin :)
Grüße zurück aus Aachen (aktuell).
FYI: Was not able to put much time into it (and did not find anything yet) and will only work on this at the end of this week again due to private stuff. - Would be nice if you keep the role active so I can debug it further. - Still not understanding what happens their. Was expecting pagination bugs which does not seem to be the problem.
Thanks for the update. I will keep the role active.
Hi @thomaspeitz did you manage to have another look?
Hi! We're facing the same issue, we have around 350 queues and some of them are entirely ignored. Reverting back to old cloudwatch exporter fixes the issue. We're using version: v0.28.0-alpha
Hi, we are having the same issue, happy to see that it was reported already :)
Sorry was doing vacation. Back again.
@OliverKlette85 if you have the IAM still configured I will take again a look on this topic this week.
Yes it is still active.
Thats gold worth to know "Reverting back to old cloudwatch exporter fixes the issue. We're using version: v0.28.0-alpha" @endyrocket - Thanks! - Makes it easier to debug.
Awesome thank you @OliverKlette85 will work on that probably Thuersday / Friday.
Currently my active work on the project is cut to 2h a week due to no revenue generation from the project. Will try my best to fix this but it is top of the list (at least) to get fixed.
I've been experiencing the same issue and I think I found a data point. All of my missing SQS queues don't have any tags applied to them. As soon as I applied one tag (anything), a few minutes later the metrics would start showing up in YACE.
I think the issue is that this API call to resourcegroupstaggingapi/get-resources
used to return all SQS-type resources, but now AWS is only returning those that have been tagged.
{"ResourceTypeFilters":["sqs"],"ResourcesPerPage":100}
Maybe there's another way to get the list of resources to query? Or just tag all of your SQS queues with something arbitrary
@mmanjos I can confirm that adding tags to SQS queue solves this issue.
I had the same problem with my queues not being visible in exported metrics. Turned out those queues had 0 tags on them. After adding arbitrary tag I was able to query metric.
Is there an existing issue for this?
Current Behavior
We have around 9,5 k SQS queues in the eu-west-1 region of one of our prod accounts, but the YACE exporter only provides metrics for around 5 K of them.
I already tried to run several YACE instances in parallel:
Both didn't improve the situation. I also requested quota increase of AWS quotas for GetMetricData (1000 per second) and ListMetrics (100 per second) requests and according to AWS monitoring we are far away from reaching it.
In the YACE debug log I couldn't find any entries which explain the missing metrics.
Expected Behavior
The exporter should provide metrics of all SQS queue (it worked with official Cloudwatch exporter)
Steps To Reproduce
config:
Anything else?
No response