prometheus-community / ipmi_exporter

Remote IPMI exporter for Prometheus
MIT License
459 stars 132 forks source link

Problems with synchronisation between ipmi_remote.yml and config.yml #148

Closed Hydrapozza closed 1 year ago

Hydrapozza commented 1 year ago

Hello, I want to use the ipmi exporter as a container so here are my configs :

ipmi_remote.yml

modules:
        default:
                # These settings are used if no module is specified, the
                # specified module doesn't exist, or of course if
                # module=default is specified.
                user: "root"
                pass: "myPass"
                # The below settings correspond to driver-type, privilege-level, and
                # session-timeout respectively, see `man 5 freeipmi.conf` (and e.g.
                # `man 8 ipmi-sensors` for a list of driver types).
                driver: "LAN_2_0"
                privilege: "admin"
                # The session timeout is in milliseconds. Note that a scrape can take up
                # to (session-timeout * #-of-collectors) milliseconds, so set the scrape
                # timeout in Prometheus accordingly.
                # Must be larger than the retransmission timeout, which defaults to 1000.
                timeout: 10000
                # Available collectors are bmc, ipmi, chassis, dcmi, sel, and sm-lan-mode
                # If _not_ specified, bmc, ipmi, chassis, and dcmi are used
                collectors:
                #- bmc
                #- ipmi
                #- chassis
                # Got any sensors you don't care about? Add them here.
                exclude_sensor_ids:
                #- 2
                collector_cmd:
                        ipmi: sudo
                        bmc: sudo
                        chassis: sudo
                custom_args:
                        ipmi:
                        - "ipmimonitoring"
                        bmc:
                        - "ipmi-bmc"
                        chassis:
                        - "ipmi-chassis"

prometheus.yml

- job_name: 'ipmi'
    params:
      module: ['default']
    scrape_interval: 1m
    scrape_timeout: 30s
    metrics_path: /metrics
    scheme: http
    file_sd_configs:
      - files:
        - '/prometheus/ARC0_targets.yml'

    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: ipmi-exporter:9290

docker-compose.yml

ipmi-exporter:
    build:
      context: .
    image: ipmi_exporter2.0
    networks:
      - monitoring
    volumes:
      - /home/osadmin/monitaur/ipmi_exporter/ipmi_remote.yml:/config.yml:ro
        #- /home/osadmin/monitaur/refish_exporter/redfish_exporter.yml:/etc/prometheus/redfish_exporter.yml
    ports:
      - "9290:9290"

ARC0_targets.yml

- targets: [ '10.104.86.45' ]
  labels:
    hostname: ARC0CPU005
    pop: ARC0
    role: Compute
- targets: [ '10.104.86.46' ]
  labels:
    hostname: ARC0CPU006
    pop: ARC0
    role: Compute

Now the problem is about the prometheus.yml configuration. When i use /metrics, prometheus isn't returning any error but the exporter returns :

monitaur-ipmi-exporter-1     | ts=2023-03-13T14:17:02.281Z caller=collector_ipmi.go:151 level=error msg="Failed to collect sensor data" target=[local] error="error running ipmimonitoring: exit status 1: could not find inband device\n"
monitaur-ipmi-exporter-1     | ts=2023-03-13T14:17:02.284Z caller=collector_dcmi.go:53 level=error msg="Failed to collect DCMI data" target=[local] error="error running ipmi-dcmi: exit status 1: could not find inband device\n"
monitaur-ipmi-exporter-1     | ts=2023-03-13T14:17:02.287Z caller=collector_bmc.go:53 level=error msg="Failed to collect BMC data" target=[local] error="error running bmc-info: exit status 1: could not find inband device\n"
monitaur-ipmi-exporter-1     | ts=2023-03-13T14:17:02.289Z caller=collector_chassis.go:53 level=error msg="Failed to collect chassis data" target=[local] error="error running ipmi-chassis: exit status 1: could not find inband device\n"

First problem i've noticed is "target=[local]" which means the exporter isn't using the target list I gave him (I guess). But also the error "could not find inband device" is weird because from the ipmi-exporter container, I'm completly able to use a typical command like "ipmi-chassis -D lanplus -h 10.104.86.45 -u root -p 'myPass' --get-chassis-status. (If i'm correct LAN_2_0 is equivalent to lanplus)

Then I tryed to change /metrics to /ipmi but i'm getting an error 400 from prometheus when scraping and the exporter isn't returning anything except the default launching logs:

monitaur-ipmi-exporter-1     | ts=2023-03-13T14:34:04.483Z caller=main.go:103 level=info msg="Starting ipmi_exporter" version="(version=1.6.1, branch=master, revision=8fdc078f6c7ccd4ce443e8e5711d34149c81f3fe)"
monitaur-ipmi-exporter-1     | ts=2023-03-13T14:34:04.483Z caller=tls_config.go:232 level=info msg="Listening on" address=[::]:9290
monitaur-ipmi-exporter-1     | ts=2023-03-13T14:34:04.483Z caller=tls_config.go:235 level=info msg="TLS is disabled." http2=false address=[::]:9290

When I'm executing the ipmi container, I'm also able to use FreeIPMI commands...

By using "./ipmi_exporter --config.file=ipmi_remote.yml --log.level=info" on the host And pointing to a specific target as "http://localhost:9290/ipmi?target=10.104.86.41", It ask on the CLI the password of the host then returns the good values:

# HELP ipmi_chassis_power_state Current power state (1=on, 0=off).
# TYPE ipmi_chassis_power_state gauge
ipmi_chassis_power_state 1
# HELP ipmi_current_amperes Current reading in Amperes.
# TYPE ipmi_current_amperes gauge
ipmi_current_amperes{id="57",name="Current 1"} 0.8
ipmi_current_amperes{id="58",name="Current 2"} 1
# HELP ipmi_current_state Reported state of a current sensor (0=nominal, 1=warning, 2=critical).
# TYPE ipmi_current_state gauge
ipmi_current_state{id="57",name="Current 1"} 0
ipmi_current_state{id="58",name="Current 2"} 0
# HELP ipmi_fan_speed_rpm Fan speed in rotations per minute.
# TYPE ipmi_fan_speed_rpm gauge
ipmi_fan_speed_rpm{id="158",name="Fan4A"} 8040
ipmi_fan_speed_rpm{id="159",name="Fan4B"} 6120
ipmi_fan_speed_rpm{id="160",name="Fan5A"} 7200
ipmi_fan_speed_rpm{id="161",name="Fan5B"} 5160
ipmi_fan_speed_rpm{id="162",name="Fan6A"} 9720
ipmi_fan_speed_rpm{id="163",name="Fan6B"} 7920
ipmi_fan_speed_rpm{id="164",name="Fan7A"} 9840
ipmi_fan_speed_rpm{id="165",name="Fan7B"} 7560
ipmi_fan_speed_rpm{id="166",name="Fan8A"} 9840
ipmi_fan_speed_rpm{id="167",name="Fan8B"} 7680
ipmi_fan_speed_rpm{id="35",name="Fan1A"} 8040
ipmi_fan_speed_rpm{id="36",name="Fan1B"} 6120
...

If someone could explain what's wrong in my configuration and how should I correct it would be very nice :)

bitfehler commented 1 year ago

You want the /ipmi endpoint. I think you've essentially gotten it right, except that it's asking for what I suspect to be the sudo password? Since you're using a container, I'd say you could just run the exporter as root and get rid of the sudo stuff? If not, you'll need to setup passwordless sudo in the container.

Hydrapozza commented 1 year ago

Thank you for your answer @bitfehler !

I'm already running the container as root. Something weird i've noticed is that I can use the targets URL like: http://localhost:9290/ipmi?target=10.104.86.45 after running the exporters. But it only returns :

Unknown module "default"

It's like the exporter isn't reading the ipmi_remote.yml transmited in the docker-compose.yml volume:

 volumes:
      - /home/osadmin/monitaur/ipmi_exporter/ipmi_remote.yml:/config.yml:ro

I suspect this because I have the same error when I execute ./ipmi_exporter without specifying the --config.file=ipmi_remote.yml

Also it doesn't return this log when running the exporter: time="2021-08-18T09:31:06Z" level=info msg="Loaded config file /config.yml" source="config.go:234"

bitfehler commented 1 year ago

Indeed. So it seems that the Dockerfile and the docker-compose.yml got out of sync. The container itself no longer specifies a config file, so this has to be done in the compose file. Can you add something like this:

    command: /bin/ipmi_exporter --config.file /config.yml

See also the commit I just pushed to fix this.

Hydrapozza commented 1 year ago

I agree with you, it's must be a sync issue.

After modifying the docker-compose.yml with the commit you just pushed, it returns: monitaur-ipmi-exporter-1 | ipmi_exporter: error: unexpected /bin/ipmi_exporter, try --help

bitfehler commented 1 year ago

My bad, sorry. The command arguments actually get appended to the containers entrypoint, so no need to put it in there (fix):

    command: --config.file /config.yml
Hydrapozza commented 1 year ago

I think it works now, but I'm getting another error without link with the previous one : monitaur-ipmi-exporter-1 | ts=2023-03-14T11:06:56.438Z caller=collector_chassis.go:53 level=error msg="Failed to collect chassis data" target=10.104.86.33 error="error running sudo: exec: \"sudo\": executable file not found in $PATH: "

I'm working on it, thank you @bitfehler !