Open MarcelSchaible opened 6 years ago
@MarcelSchaible Can you attach the log from snapteld with turned on debug logging "-l 1" ?
Here is the debug log:
@MarcelSchaible What's your Golang version? Thanks.
@TaoTod: 1.10.3
@TaoTod: From my understanding there is a configuration option in our setup missing which causes the plugin crash. Do you have an idea which option is missing and maybe you can provide me a working plugin config?
We are still investigating this issue and have no quick work around so far. We will update the status once we have any solution. Many thanks for your patience.
@MarcelSchaible have you tried the sample configuration in document? please make sure the platform has ipmi support and ipmidriver is contained in OS.
@dancyding: Yep, tried of course first the sample configuration and the platform supports ipmi and the in documentation kernel modules are loaded (see below)
Do you have idea where this error message come from?
Error loading plugin:
unexpected EOF
=============================
/etc/snapteld.conf
...
# plugins section contains plugin config settings that will be applied for
# plugins across tasks.
plugins:
collector:
intel-dcm-platform:
all:
protocol: node-manager
mode : legacy_inband
channel : 0x06
slave : 0x2C
...
[root@pcie7410-s15-c1 snap]# lsmod | grep ipmi
ipmi_poweroff 14506 0
ipmi_devintf 17572 4
ipmi_si 53582 3
ipmi_msghandler 46608 3 ipmi_devintf,ipmi_poweroff,ipmi_si
[root@pcie7410-s15-c1 snap]# ipmitool sensor | head
FAN1TRAY1 | 2075.000 | RPM | ok | 1200.000 | 1500.000 | 1800.000 | na | na | na
FAN2TRAY1 | 2025.000 | RPM | ok | 1200.000 | 1500.000 | 1800.000 | na | na | na
FAN1TRAY2 | 2075.000 | RPM | ok | 1200.000 | 1500.000 | 1800.000 | na | na | na
FAN2TRAY2 | 2025.000 | RPM | ok | 1200.000 | 1500.000 | 1800.000 | na | na | na
FAN1TRAY3 | 2050.000 | RPM | ok | 1200.000 | 1500.000 | 1800.000 | na | na | na
FAN2TRAY3 | 2100.000 | RPM | ok | 1200.000 | 1500.000 | 1800.000 | na | na | na
PSU_PRSNT0 | 0x0 | discrete | 0x0180| na | na | na | na | na | na
PSU_ACOK0 | 0x0 | discrete | 0x0180| na | na | na | na | na | na
PSU_PWROK0 | 0x0 | discrete | 0x0180| na | na | na | na | na | na
PSU0CUR | 1.000 | Amps | ok | na | na | na | 32.000 | 36.000 | 38.000
Update: When I explicitly load the DCM plugin as described in the documentation it seems to work:
config.json:
{
"control" : {
"plugins": {
"collector": {
"intel-dcm-platform": {
"all": {
"protocol": "node_manager",
"mode": "legacy_inband",
"channel": "0x06",
"slave": "0x2C"
}
}
}
}
}
}
And loading the plugin via command line works fine:
$ snapteld -l 1 -t 0 --config config.json
$ snaptel plugin load snap-plugin-collector-intel-dcm-platform
$ snaptel metric list
snaptel plugin list
DEBU[2018-06-25T15:45:34+02:00] API request _module=_mgmt-rest index=4 method=GET url=/v1/plugins
DEBU[2018-06-25T15:45:34+02:00] API response _module=_mgmt-rest index=4 method=GET status=OK status-code=200 url=/v1/plugins
NAME VERSION TYPE SIGNED STATUS LOADED TIME
intel-dcm-platform 1 collector false loaded Mon, 25 Jun 2018 15:31:49 CEST
[root@pcie7410-s15-c1 multi-user.target.wants]# snaptel metric list
DEBU[2018-06-25T15:45:42+02:00] API request _module=_mgmt-rest index=5 method=GET url=/v1/metrics
DEBU[2018-06-25T15:45:42+02:00] API response _module=_mgmt-rest index=5 method=GET status=OK status-code=200 url=/v1/metrics
NAMESPACE VERSIONS
/intel/dcm/health/battery 1
/intel/dcm/health/fan 1
/intel/dcm/health/memory 1
/intel/dcm/health/powersupply 1
/intel/dcm/health/processor 1
/intel/dcm/health/storage 1
/intel/dcm/health/temperature 1
/intel/dcm/health/voltage 1
/intel/dcm/inventory/bmc_mac 1
/intel/dcm/inventory/firmware_version 1
/intel/dcm/inventory/product_manufacturer 1
/intel/dcm/inventory/product_name 1
/intel/dcm/inventory/product_serial 1
Since we are using several plugins we want to put the configuartion of the dcm plugin in the global configuration of snapteld:
...
# plugins section contains plugin config settings that will be applied for
# plugins across tasks.
plugins:
collector:
intel-dcm-platform:
all:
path: /opt/snap/plugins
versions:
1:
protocol: node-manager
mode : legacy_inband
channel : 0x06
slave : 0x2C
...
This does not work at all. Even the snapteld daemon refuses to start.
` $ systemctl start snap-telemetry
$tail /var/log/snap/snapteld.log
time="2018-06-25T15:29:32+02:00" level=info msg="plugin unload called" _block=unload-plugin _module=control-plugin-mgr path=[snap-plugin-publisher-file] time="2018-06-25T15:29:32+02:00" level=debug msg="Removing plugin" _module=control-plugin-mgr plugin-name=file plugin-path="/tmp/snap-plugin-246273045/snap-plugin-publisher-file" plugin-type=publisher plugin-version=2 time="2018-06-25T15:29:32+02:00" level=info msg="control stopped" _block=stop _module=control time="2018-06-25T15:29:32+02:00" level=info msg="stopping module" _module=snapteld block=main snap-module=scheduler time="2018-06-25T15:29:32+02:00" level=info msg="scheduler stopped" _block=stop-scheduler _module=scheduler time="2018-06-25T15:29:32+02:00" level=info msg="stopping module" _module=snapteld block=main snap-module=REST time="2018-06-25T15:29:32+02:00" level=info msg="REST stopped" _block=stop _module="_mgmt-rest" time="2018-06-25T15:29:32+02:00" level=info msg="exiting on signal" _module=snapteld block=main signal=interrupt ` Any idea or hint is appreciated.
Thanks
I have found the problem in plugin:
func getChannel(config map[string]ctypes.ConfigValue) string {
if channel, ok := config["channel"]; ok {
return channel.(ctypes.ConfigValueStr).Value <=============== Expects a string
}
return "0x00" //Default channel addr
}
Since I am passing a hexadecimal number as channel an exception is thrown:
# plugins section contains plugin config settings that will be applied for
# plugins across tasks.
plugins:
collector:
intel-dcm-platform:
all:
path: /opt/snap/plugins
versions:
1:
protocol: node-manager
mode : legacy_inband
channel : 0x06
slave : 0x2C
I think a little bit more checking and better error reporting would be helpful.
@MarcelSchaible Thanks for the information. For previous design we only accept string value for config. But I think we need improve this by more checking and error reporting. And hope currently this issue could be workaround by specify the value as string type in configuration file.
Snap version (use
snapctl -v
): snaptel version 2.0.0Environment:
Cloud provider or hardware configuration: ArteSyn MAXCORE 3000
OS (e.g. from /etc/os-release): NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"
uname -a
):Linux pcie7410-s15-c1 3.10.0-514.26.2.1.el7.x86_64 #1 SMP Tue Jan 30 08:20:53 MST 2018 x86_64 x86_64 x86_64 GNU/Linux
snaptel plugin list
NAME VERSION TYPE SIGNED STATUS LOADED TIME df 6 collector false loaded Thu, 14 Jun 2018 14:34:22 CEST psutil 8 collector false loaded Thu, 14 Jun 2018 14:34:23 CEST smart-disk 9 collector false loaded Thu, 14 Jun 2018 14:34:24 CEST file 2 publisher false loaded Thu, 14 Jun 2018 14:34:24 CEST
What happened:
The newly builded version of the plugin on CentOS v7 fails to load with the following error message:
snaptel plugin load snap-plugin-collector-intel-dcm-platform
Error loading plugin: unexpected EOF What you expected to happen:
Succesful loading of the plugin.
Steps to reproduce it (as minimally and precisely as possible):
Anything else do we need to know (e.g. issue happens only occasionally):
The issue is reproducible.