Closed llossinxw closed 1 year ago
@llossinxw can you please try a more recent version of telegraf e.g. v1.28.1?
Hi @srebhan I already tried with Telegraf v1.28.1-alpine. The error is still present but the logs are less verbose.
telegraf_openstack_1 | panic: runtime error: invalid memory address or nil pointer dereference
telegraf_openstack_1 | [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x273c2f3]
openstack_datasource_telegraf_openstack_1 exited with code 2
Thanks for trying, will investigate but I might need your help to hunt that one down...
@llossinxw I do have a theory... Does the crash directly happen at the first Gather()
interval or only after some time? If the former, can you please add "orchestration"
to the enabled_services
option and let me know if this fixes the issue!?
@srebhan I guess that the crash is happening at the first Gather()
since no metric is flushed towards the output plugin according to the logs.
telegraf_openstack.log
@llossinxw can you please test the binary available in PR #14011 once CI finished all tests successfully! Please let me know if this fixes your issue.
Heads-up: You will probably see one or more warnings of the form
W! "Disabling "stacks" service because orchestration is not available at the endpoint!
as the problem is that your endpoint need to provide orchestration for a client to query stacks...
@srebhan I just tried the new binary. It works as expected, thank you! The logs are these
Just a question, I am running Telegraf inside a Docker container with Docker Swarm and I am pulling the Telegraf official image from Dockerhub. Will this updated openstack input plugin will be part of Telegraf version 1.29? That is, will I have to update the image to telegraf:1.29 when it will be released to exploit this updated input plugin? Sorry for the newbie question
@srebhan hi, seems its broken somehow, getting (1.28.3):
2023-10-28T17:53:44Z W! [inputs.openstack] Disabling "cinder_services" service because block-storage is not available at the endpoint!
2023-10-28T17:53:44Z W! [inputs.openstack] Disabling "storage_pools" service because block-storage is not available at the endpoint!
2023-10-28T17:53:44Z W! [inputs.openstack] Disabling "volumes" service because block-storage is not available at the endpoint!
with:
enabled_services = ["agents", "aggregates", "cinder_services", "flavors", "hypervisors", "networks", "nova_services", "ports", "projects", "servers", "services", "storage_pools", "subnets", "volumes"]
and Openstack 2023.1, of course, I have enabled every service except heat.
@srebhan hasBlockStorage = true
is missing in https://github.com/influxdata/telegraf/blob/master/plugins/inputs/openstack/openstack.go#L200
There is another problem, typo here https://github.com/influxdata/telegraf/blob/master/plugins/inputs/openstack/openstack.go#L788, should be f o.services["servers"] {
EDIT:
Another:
vcpus, disk_gb, ram_mb, project
labels are not populated correctly in openstack_server
metric.
Someone should take a look at this plugin, cuz its not working correctly.
@lukasmrtvy please file a new issue please, rather than commenting on a closed one.
Relevant telegraf.conf
Logs from Telegraf
System info
Telegraf 1.21.2-alpine, Ubuntu 22.04.3 LTS
Docker
version: '3.3' services: telegraf_openstack: image: telegraf:1.21.2-alpine hostname: telegraf_openstack extra_hosts:
Steps to reproduce
telegraf_openstack_panic_bug/config/telegraf_openstack/telegraf.conf
authentication_endpoint
,domain
,project
,username
,password
docker-compose up
commandExpected behavior
The Telegraf container should collect all the metrics coming from the available enabled services by polling the OpenStack APIs, skipping eventual non-available services.
Actual behavior
The Telegraf container stops after the first interval returning a panic segmentation fault with the logs included above.
Additional info
Different tests have been performed by enabling the possible
enabled_services
allowed values (i.e."agents", "aggregates", "flavors", "hypervisors", "networks", "nova_services", "ports", "projects", "servers", "services", "stacks", "storage_pools", "subnets", "volumes"
) one by one.The panic occurs when the
enabled_services
field in the Telegraf configuration file includes ar least one of"stacks", "storage_pools"
or"volumes"
values.My guess is that the reason of this bug is that the Heat and the Cinder OpenStack modules (in charge of returning the
"stacks"
and the"storage_pools"
&"volumes"
metrics, respectively) are not enabled in the OpenStack instance under collection.