Closed ri0day closed 2 years ago
Hi!
Good catch! Apparently, no one has ever tried to use an array and the lack of a test for an array never caught the logic issue. Check out https://github.com/influxdata/telegraf/pull/10850 as I put up a fix with tests.
thanks for the quick fix ,however , i found the signle dimension also dosen't work as expected , i set just one rds instanceId in dimension ,but got all instances metrics outputed .
i was expect get ["ConnectionUsage", "CpuUsage","DiskUsage","IOPSUsage","MemoryUsage"]
metrics just from instance rm-bp1zureya4l415lus
[root@d5cs-jobs telegraf]# tail telegraf-aliyun.conf
[[inputs.aliyuncms]]
regions = ["cn-hangzhou"]
period = "5m"
delay = "1m"
interval = "5m"
project = "acs_rds_dashboard"
ratelimit = 200
[[inputs.aliyuncms.metrics]]
names = ["ConnectionUsage", "CpuUsage","DiskUsage","IOPSUsage","MemoryUsage"]
dimensions = '{"instanceId":"rm-bp1zureya4l415lus"}'
[root@d5cs-jobs telegraf]# ./telegraf --config telegraf-aliyun.conf --test --debug 2022-03-18T15:20:44Z I! Starting Telegraf 1.21.4 2022-03-18T15:20:44Z I! Loaded inputs: aliyuncms 2022-03-18T15:20:44Z I! Loaded aggregators: 2022-03-18T15:20:44Z I! Loaded processors: 2022-03-18T15:20:44Z W! Outputs are not used in testing mode! 2022-03-18T15:20:44Z I! Tags enabled: host=d5cs-jobs 2022-03-18T15:20:44Z D! [agent] Initializing plugins 2022-03-18T15:20:44Z E! [inputs.aliyuncms] Discovery tool is not activated: Didn't find root key "DBInstances" in discovery response 2022-03-18T15:20:44Z D! [agent] Starting service inputs aliyuncms_acs_rds_dashboard,host=d5cs-jobs,instanceId=rm-bp16j0l01ze9916oz,userId=1692386295190525 connection_usage_average=0.585,connection_usage_maximum=0.588,connection_usage_minimum=0.583 1647616500000000000 aliyuncms_acs_rds_dashboard,host=d5cs-jobs,instanceId=rm-bp15ha3043v8030wj,userId=1692386295190525 connection_usage_average=7.523,connection_usage_maximum=7.524,connection_usage_minimum=7.524 1647616500000000000 aliyuncms_acs_rds_dashboard,host=d5cs-jobs,instanceId=rm-bp1q665hm9edarj7f,userId=1692386295190525 connection_usage_average=0.51,connection_usage_maximum=0.537,connection_usage_minimum=0.5 1647616500000000000 aliyuncms_acs_rds_dashboard,host=d5cs-jobs,instanceId=rm-bp14c70a2wh68v059,userId=1692386295190525 connection_usage_average=5.078,connection_usage_maximum=5.199,connection_usage_minimum=5.038 1647616500000000000 aliyuncms_acs_rds_dashboard,host=d5cs-jobs,instanceId=rm-bp1jmt70a9tfpyofw,userId=1692386295190525 connection_usage_average=0.625,connection_usage_maximum=0.625,connection_usage_minimum=0.625 1647616500000000000 ......
i tested with your fixed build ,i can confirm the plugins dimensions variable can accept an string quoted arrary now, but the dimension seem don't work as we expected, because the plugin still fetch all instances metrics .
@ri0day,
Thanks for trying the fix and confirming it works!
dimension seem don't work as we expected, because the plugin still fetch all instances metrics .
Hmm, I looked around and only saw reference in this bug to how dimensions might get silently ignored. I have pushed another change that will print out the request and the dimensions variable to see how it is formatted. Once those artifacts build, can you provide that output?
Thanks!
For my own reference, found the English docs looks like the dimensions should look like a JSON string:
{\"userId\":\"120886317861****\",\"region\":\"cn-huhehaote\",\"queue\":\"test-0128\"}
Getting that debug output would be good to confirm that we are actually sending the right type of data.
hi @powersj just tried you latest fix build(telegraf-1.22.0~06899624_linux_amd64.tar.gz) ,the dimension print out the memory address
[root@d5cs-jobs bin]# ./telegraf --config /opt/telegraf/telegraf-aliyun.conf --test-wait 10 --debug
2022-03-19T01:14:11Z I! Starting Telegraf 1.22.0-06899624
2022-03-19T01:14:11Z I! Loaded inputs: aliyuncms
2022-03-19T01:14:11Z I! Loaded aggregators:
2022-03-19T01:14:11Z I! Loaded processors:
2022-03-19T01:14:11Z W! Outputs are not used in testing mode!
2022-03-19T01:14:11Z I! Tags enabled: host=d5cs-jobs
2022-03-19T01:14:11Z D! [agent] Initializing plugins
2022-03-19T01:14:12Z E! [inputs.aliyuncms] Discovery tool is not activated: Didn't find root key "DBInstances" in discovery response
2022-03-19T01:14:12Z D! [agent] Starting service inputs
Making the following request:
Making the following request:
Making the following request:
&{0xc00058c058 1647652092000 CpuUsage 300 10000 1647652392000 acs_rds_dashboard }
Request Dimensions:
&{0xc00048e3b0 1647652092000 IOPSUsage 300 10000 1647652392000 acs_rds_dashboard }
&{0xc00059c270 1647652092000 ConnectionUsage 300 10000 1647652392000 acs_rds_dashboard }
Request Dimensions:
Making the following request:
Making the following request:
&{0xc000011278 1647652092000 DiskUsage 300 10000 1647652392000 acs_rds_dashboard }
Request Dimensions:
Request Dimensions:
&{0xc00048e3b8 1647652092000 MemoryUsage 300 10000 1647652392000 acs_rds_dashboard }
Request Dimensions:
......
2022-03-19T01:14:22Z D! [agent] Stopping service inputs
2022-03-19T01:14:22Z D! [agent] Input channel closed
2022-03-19T01:14:22Z D! [agent] Stopped Successfully
2022-03-19T01:14:22Z E! [telegraf] Error running agent: input plugins recorded 1 errors
I had the PR print 2 things, first the request object, which is the memory-like results you see. The second thing was the request dimensions string itself:
Request Dimensions:
Request Dimensions:
This shows that no dimensions are specified in the request. Looking at the function and your logs more I noticed this message:
2022-03-19T01:14:12Z E! [inputs.aliyuncms] Discovery tool is not activated: Didn't find root key "DBInstances" in discovery response
When the discovery tool is not active, this sets s.dt
to nil. When it is nil the dimensions will not be configured. Is this a key you are specifying?
this is the config i used to test seems no discovery related config here
[[inputs.aliyuncms]]
regions = ["cn-hangzhou"]
period = "5m"
delay = "1m"
interval = "5m"
project = "acs_rds_dashboard"
ratelimit = 200
[[inputs.aliyuncms.metrics]]
names = ["ConnectionUsage", "CpuUsage","DiskUsage","IOPSUsage","MemoryUsage"]
dimensions = '[{"instanceId":"rm-bp1zureya4l415lus"},{"instanceId":"rm-bp13g5b435ex60of3"}]'
for the discovery error ,
2022-03-19T01:14:12Z E! [inputs.aliyuncms] Discovery tool is not activated: Didn't find root key "DBInstances" in discovery response
maybe the plugin is expecting DBInstances
in response ,but the aliyun cloud monitor api response is actually like this ,maybe the responseRootKey in discovery should be DBInstance
instead of DBInstances
{
"TotalRecordCount": 1,
"PageRecordCount": 1,
"RequestId": "8EED1083-3902-557A-9AF4-822BE5C9AF14",
"NextToken": "o7PORW53prZg8NUW9EJ7Yw",
"PageNumber": 1,
"Items": {
"DBInstance": [
{
"ResourceGroupId": "rg-acfm2jr35xnjh7i",
"DBInstanceNetType": "Intranet",
"DBInstanceType": "Primary",
"MutriORsignle": false,
"InstanceNetworkType": "VPC",
"DBInstanceId": "rm-bp1075l0623jbo084",
"ReadOnlyDBInstanceIds": {
"ReadOnlyDBInstanceId": []
},
"DBInstanceDescription": "CMBG-live-DB6",
"Engine": "MySQL",
"EngineVersion": "5.7",
"ZoneId": "cn-hangzhou-i",
"DBInstanceStatus": "Running",
"DBInstanceClass": "mysql.n2.large.2c",
"CreateTime": "2022-03-11T01:52:20Z",
"VSwitchId": "vsw-bp15efgph9rd6rl7xqgm5",
"TipsLevel": 0,
"PayType": "Prepaid",
"LockMode": "Unlock",
"DeletionProtection": false,
"DBInstanceStorageType": "cloud_essd",
"InsId": 1,
"VpcId": "vpc-232m6l510",
"ConnectionMode": "Standard",
"VpcCloudInstanceId": "rm-bp1075l0623jbo084-202203110952",
"RegionId": "cn-hangzhou",
"ConnectionString": "rm-bp1075l0623jbo084.mysql.rds.aliyuncs.com",
"ExpireTime": "2025-03-11T16:00:00Z"
}
]
}
}
Thanks for that! You are right, it does look like a different root key, then say the acs_ecs_dashboard:
{
"Instances": {
"Instance": [
{
versus what you see with the acs_rds_dashboard:
{
"Items": {
"DBInstance": [
{
I pushed another couple of commits to update the SDK and try to look for "Items" as a root key as well. Can you give that a shot?
Hi, @powersj ,i just tried you latest build(telegraf-1.22.0~805d150b_linux_amd64.tar.gz) ,the discovery function seem running forever, check output screenshot
output gif(about 50M) can we schedule an pair programing session to debug this?
hmm, thanks for the gif!
I am really not sure what direction to go with this. I am still confused why the response does not have the DBInstances
expected value in the first place either. That expected value comes from aliyun's own SDK. As such I am wondering if it is worth filing a bug with them to see if the response format changed and if the SDK needs an update?
the different service will have different api resonpse rootkey
, loadbalancer rootkey is LoadBalancers
rds rootkey is items
,ecs rootkey is Instances
you can check https://next.api.alibabacloud.com/home to explore the api without write code
for example CreateDescribeLoadBalancersRequest() it's return
{
.....
"LoadBalancers": {
"LoadBalancer": [ here comes objects, one per every instance]
}
}
but CreateDescribeDBInstancesRequest() it's return
{
......
"Items": {
"DBInstance": [
so maybe we can not using this pattern to catch the rootkey
parseRootKey = regexp.MustCompile(`Describe(.*)`)
we can define the rootkey in here
switch project {
case "acs_ecs_dashboard":
dscReq[region] = ecs.CreateDescribeInstancesRequest()
responseObjectIDKey = "InstanceId"
Hi,
You are using the acs_rds_dashboard
, right? In that case, I updated that case with Items:
case "acs_rds_dashboard":
dscReq[region] = rds.CreateDescribeDBInstancesRequest()
responseObjectIDKey = "Items"
hi @powersj ,tired you lasted build(telegraf-1.23.0~90a862ee_linux_amd64.tar.gz) still output all the instance metrics ,but f fond the aliyuncms api for rds is totally different from others service
for example the rds response is like this:
{"TotalRecordCount":116,"PageRecordCount":100,"RequestId":"AF7BFA71-652D-5982-BEFE-B2B0365F68A2","NextToken":"o7PORW5vm_Zg8NUW9EJ7Yw","PageNumber":1,"Items":{"DBInstance":[xxxxx]}
but in our plugin ,we parse it with different keyword
case "TotalCount":
pdResp.totalCount = int(val.(float64))
case "PageSize":
pdResp.pageSize = int(val.(float64))
case "PageNumber":
pdResp.pageNumber = int(val.(float64))
}
in order get rds discovery working we need code like this
case "TotalRecordCount":
pdResp.totalCount = int(val.(float64))
case "PageRecordCount":
pdResp.pageSize = int(val.(float64))
case "PageNumber":
pdResp.pageNumber = int(val.(float64))
}
and another different is ,in order to get instance data from aliyum cms response rds service instance data need retrieve from $Response.Items.DBInstance the other service can get instance data from $Response.$ServiceName+"s" for example: loadbalancer service -->$Response.LoadBalancers.LoadBalancer ecs service --> $Response.Instances.Instance
i managed get an barely working code just for acs_rds_dashboard
,you can check out
aliyuncms.go
discovery.go
@ri0day - huge thank you for diving in on this and working it out. I have updated the PR with your changes, with a few modifications to ensure that the previous behavior works as well. Could you give the PR a try?
Thanks!
@powersj i tested your latest build ,i can confirm ,rds ,ecs are working as expected thank you
Relevant telegraf.conf
Logs from Telegraf
[root@d5cs-jobs telegraf]# ./telegraf --config ./telegraf-aliyun.conf --test --debug 2022-03-18T04:29:18Z I! Starting Telegraf 1.21.4 2022-03-18T04:29:18Z I! Loaded inputs: aliyuncms 2022-03-18T04:29:18Z I! Loaded aggregators: 2022-03-18T04:29:18Z I! Loaded processors: 2022-03-18T04:29:18Z W! Outputs are not used in testing mode! 2022-03-18T04:29:18Z I! Tags enabled: host=d5cs-jobs 2022-03-18T04:29:18Z D! [agent] Initializing plugins 2022-03-18T04:29:18Z E! [telegraf] Error running agent: could not initialize input inputs.aliyuncms: Can't parse dimensions (it is neither obj, nor array) "[{\"instanceId\": \"rm-bp1zureya4l415lus\"},{\"instanceId\": \"rm-bp161628eanrceqz2\"}]" :
System info
Telegraf 1.21.4
Docker
No response
Steps to reproduce
./telegraf --config ./telegraf-aliyun.conf --test --debug
...
Expected behavior
according to plugin example, we can pass an quoted arrary to dimension variable like
dimensions = '[{"instanceId": "p-example"},{"instanceId": "q-example"}]'
Actual behavior
marshal failed
Additional info
dimensions = '{"instanceId": "p-example"}'
signle dimensions is work properlydimensions = [{"instanceId": "rm-bp1zureya4l415lus"},{"instanceId": "rm-bp161628eanrceqz2"}]
also failed