intelsdi-x / snap-plugin-collector-intel-dcm-platform

Collects metrics (power, thermal, health, inventory) from different OEM and ODM vendors’ platforms through IPMI and IPMI OEM extensions, such as Node Manager, DCMI, or IPMI SDR.
Apache License 2.0
3 stars 11 forks source link

Plugin failed to load, because of wrong type of config value #15

Open MarcelSchaible opened 6 years ago

MarcelSchaible commented 6 years ago

Snap version (use snapctl -v): snaptel version 2.0.0

Environment:

CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7"

Linux pcie7410-s15-c1 3.10.0-514.26.2.1.el7.x86_64 #1 SMP Tue Jan 30 08:20:53 MST 2018 x86_64 x86_64 x86_64 GNU/Linux

snaptel plugin list

NAME VERSION TYPE SIGNED STATUS LOADED TIME df 6 collector false loaded Thu, 14 Jun 2018 14:34:22 CEST psutil 8 collector false loaded Thu, 14 Jun 2018 14:34:23 CEST smart-disk 9 collector false loaded Thu, 14 Jun 2018 14:34:24 CEST file 2 publisher false loaded Thu, 14 Jun 2018 14:34:24 CEST

What happened:

The newly builded version of the plugin on CentOS v7 fails to load with the following error message:

snaptel plugin load snap-plugin-collector-intel-dcm-platform

Error loading plugin: unexpected EOF What you expected to happen:

Succesful loading of the plugin.

Steps to reproduce it (as minimally and precisely as possible):

  1. Build plugin as described in the plugin documentation
  2. snaptel plugin load snap-plugin-collector-intel-dcm-platform

Anything else do we need to know (e.g. issue happens only occasionally):

The issue is reproducible.

sandlbn commented 6 years ago

@MarcelSchaible Can you attach the log from snapteld with turned on debug logging "-l 1" ?

MarcelSchaible commented 6 years ago

Here is the debug log:

snapteld.log

taotod commented 6 years ago

@MarcelSchaible What's your Golang version? Thanks.

MarcelSchaible commented 6 years ago

@TaoTod: 1.10.3

MarcelSchaible commented 6 years ago

@TaoTod: From my understanding there is a configuration option in our setup missing which causes the plugin crash. Do you have an idea which option is missing and maybe you can provide me a working plugin config?

taotod commented 6 years ago

We are still investigating this issue and have no quick work around so far. We will update the status once we have any solution. Many thanks for your patience.

dancyding commented 6 years ago

@MarcelSchaible have you tried the sample configuration in document? please make sure the platform has ipmi support and ipmidriver is contained in OS.

MarcelSchaible commented 6 years ago

@dancyding: Yep, tried of course first the sample configuration and the platform supports ipmi and the in documentation kernel modules are loaded (see below)

Do you have idea where this error message come from?

Error loading plugin:
unexpected EOF

=============================

/etc/snapteld.conf
...
  # plugins section contains plugin config settings that will be applied for
  # plugins across tasks.
  plugins:
    collector:
      intel-dcm-platform:
        all:
          protocol: node-manager
          mode    : legacy_inband
          channel : 0x06
          slave   : 0x2C
...

[root@pcie7410-s15-c1 snap]# lsmod | grep ipmi
ipmi_poweroff          14506  0
ipmi_devintf           17572  4
ipmi_si                53582  3
ipmi_msghandler        46608  3 ipmi_devintf,ipmi_poweroff,ipmi_si

[root@pcie7410-s15-c1 snap]# ipmitool sensor | head
FAN1TRAY1        | 2075.000   | RPM        | ok    | 1200.000  | 1500.000  | 1800.000  | na        | na        | na
FAN2TRAY1        | 2025.000   | RPM        | ok    | 1200.000  | 1500.000  | 1800.000  | na        | na        | na
FAN1TRAY2        | 2075.000   | RPM        | ok    | 1200.000  | 1500.000  | 1800.000  | na        | na        | na
FAN2TRAY2        | 2025.000   | RPM        | ok    | 1200.000  | 1500.000  | 1800.000  | na        | na        | na
FAN1TRAY3        | 2050.000   | RPM        | ok    | 1200.000  | 1500.000  | 1800.000  | na        | na        | na
FAN2TRAY3        | 2100.000   | RPM        | ok    | 1200.000  | 1500.000  | 1800.000  | na        | na        | na
PSU_PRSNT0       | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na
PSU_ACOK0        | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na
PSU_PWROK0       | 0x0        | discrete   | 0x0180| na        | na        | na        | na        | na        | na
PSU0CUR          | 1.000      | Amps       | ok    | na        | na        | na        | 32.000    | 36.000    | 38.000
MarcelSchaible commented 6 years ago

Update: When I explicitly load the DCM plugin as described in the documentation it seems to work:

config.json:

{
    "control" : {
        "plugins": {
            "collector": {
                "intel-dcm-platform": {
                    "all": {
                        "protocol": "node_manager",
                        "mode": "legacy_inband",
                        "channel": "0x06",
                        "slave": "0x2C"
                    }
                }
            }
        }
    }
}

And loading the plugin via command line works fine:

$ snapteld -l 1 -t 0 --config config.json
$ snaptel plugin load snap-plugin-collector-intel-dcm-platform
$ snaptel metric list
snaptel plugin list
DEBU[2018-06-25T15:45:34+02:00] API request                                   _module=_mgmt-rest index=4 method=GET url=/v1/plugins
DEBU[2018-06-25T15:45:34+02:00] API response                                  _module=_mgmt-rest index=4 method=GET status=OK status-code=200 url=/v1/plugins
NAME                     VERSION         TYPE            SIGNED          STATUS          LOADED TIME
intel-dcm-platform       1               collector       false           loaded          Mon, 25 Jun 2018 15:31:49 CEST
[root@pcie7410-s15-c1 multi-user.target.wants]# snaptel metric list
DEBU[2018-06-25T15:45:42+02:00] API request                                   _module=_mgmt-rest index=5 method=GET url=/v1/metrics
DEBU[2018-06-25T15:45:42+02:00] API response                                  _module=_mgmt-rest index=5 method=GET status=OK status-code=200 url=/v1/metrics
NAMESPACE                                        VERSIONS
/intel/dcm/health/battery                        1
/intel/dcm/health/fan                            1
/intel/dcm/health/memory                         1
/intel/dcm/health/powersupply                    1
/intel/dcm/health/processor                      1
/intel/dcm/health/storage                        1
/intel/dcm/health/temperature                    1
/intel/dcm/health/voltage                        1
/intel/dcm/inventory/bmc_mac                     1
/intel/dcm/inventory/firmware_version            1
/intel/dcm/inventory/product_manufacturer        1
/intel/dcm/inventory/product_name                1
/intel/dcm/inventory/product_serial              1

Since we are using several plugins we want to put the configuartion of the dcm plugin in the global configuration of snapteld:

...
  # plugins section contains plugin config settings that will be applied for
  # plugins across tasks.
  plugins:
    collector:
      intel-dcm-platform:
        all:
          path: /opt/snap/plugins
        versions:
          1:
            protocol: node-manager
             mode    : legacy_inband
             channel : 0x06
             slave   : 0x2C
...

This does not work at all. Even the snapteld daemon refuses to start.

` $ systemctl start snap-telemetry

$tail /var/log/snap/snapteld.log

time="2018-06-25T15:29:32+02:00" level=info msg="plugin unload called" _block=unload-plugin _module=control-plugin-mgr path=[snap-plugin-publisher-file] time="2018-06-25T15:29:32+02:00" level=debug msg="Removing plugin" _module=control-plugin-mgr plugin-name=file plugin-path="/tmp/snap-plugin-246273045/snap-plugin-publisher-file" plugin-type=publisher plugin-version=2 time="2018-06-25T15:29:32+02:00" level=info msg="control stopped" _block=stop _module=control time="2018-06-25T15:29:32+02:00" level=info msg="stopping module" _module=snapteld block=main snap-module=scheduler time="2018-06-25T15:29:32+02:00" level=info msg="scheduler stopped" _block=stop-scheduler _module=scheduler time="2018-06-25T15:29:32+02:00" level=info msg="stopping module" _module=snapteld block=main snap-module=REST time="2018-06-25T15:29:32+02:00" level=info msg="REST stopped" _block=stop _module="_mgmt-rest" time="2018-06-25T15:29:32+02:00" level=info msg="exiting on signal" _module=snapteld block=main signal=interrupt ` Any idea or hint is appreciated.

Thanks

MarcelSchaible commented 6 years ago

I have found the problem in plugin:

func getChannel(config map[string]ctypes.ConfigValue) string {
    if channel, ok := config["channel"]; ok {
        return channel.(ctypes.ConfigValueStr).Value       <=============== Expects a string
    }
    return "0x00" //Default channel addr
}

Since I am passing a hexadecimal number as channel an exception is thrown:

  # plugins section contains plugin config settings that will be applied for
  # plugins across tasks.
  plugins:
    collector:
      intel-dcm-platform:
        all:
          path: /opt/snap/plugins
        versions:
          1:
            protocol: node-manager
             mode    : legacy_inband
             channel : 0x06
             slave   : 0x2C

I think a little bit more checking and better error reporting would be helpful.

dancyding commented 6 years ago

@MarcelSchaible Thanks for the information. For previous design we only accept string value for config. But I think we need improve this by more checking and error reporting. And hope currently this issue could be workaround by specify the value as string type in configuration file.