sflow / host-sflow

host-sflow agent
http://sflow.net
Other
153 stars 55 forks source link

hsflowd[2344]: SFF8036 ethtool ioctl failed: No such device #46

Open mehrdade opened 3 years ago

mehrdade commented 3 years ago

I'm using hsflowd to performance metrics using the sFlow protocol but facing error "hsflowd[2344]: SFF8036 ethtool ioctl failed: No such device" in /var/log/syslog. hsflowd is working properly but the error is being logged almost every 23 minutes

cat /etc/hsflowd.conf
  sflow {
    collector { ip = 127.0.0.1 UDPPort=6343 }
    sampling=100
    sampling.10G=100
    pcap { speed = 1- }
    tcp {}
  }
cat /etc/hsflowd.auto
  rev_start=1
  hostname=test
  sampling=100
  header=128
  datagram=1400
  polling=30
  sampling.10G=100
  agentIP=xxx
  agent=bond0
  ds_index=1
  collector=127.0.0.1 6343
  rev_end=1
journalctl -f -u hsflowd.service
  Oct 30 09:03:38 test hsflowd[2344]: SFF8036 ethtool ioctl failed: No such device
  Oct 30 09:03:38 test hsflowd[2344]: SFF8036 ethtool ioctl failed: No such device
  Oct 30 09:03:38 test hsflowd[2344]: SFF8036 ethtool ioctl failed: No such device
  Oct 30 09:26:24 test hsflowd[2344]: SFF8036 ethtool ioctl failed: No such device
  Oct 30 09:26:24 test hsflowd[2344]: SFF8036 ethtool ioctl failed: No such device
  Oct 30 09:26:24 test hsflowd[2344]: SFF8036 ethtool ioctl failed: No such device
  Oct 30 09:49:08 test hsflowd[2344]: SFF8036 ethtool ioctl failed: No such device
  Oct 30 09:49:08 test hsflowd[2344]: SFF8036 ethtool ioctl failed: No such device
  Oct 30 09:49:08 test hsflowd[2344]: SFF8036 ethtool ioctl failed: No such device
  Oct 30 10:11:54 test hsflowd[2344]: SFF8036 ethtool ioctl failed: No such device
  Oct 30 10:11:54 test hsflowd[2344]: SFF8036 ethtool ioctl failed: No such device
  Oct 30 10:11:54 test hsflowd[2344]: SFF8036 ethtool ioctl failed: No such device
  Oct 30 10:34:38 test hsflowd[2344]: SFF8036 ethtool ioctl failed: No such device
  Oct 30 10:34:38 test hsflowd[2344]: SFF8036 ethtool ioctl failed: No such device
  ...

Any Idea?

g00g1 commented 2 years ago

This error is generated either here or here.

As I see from those sources, this is something like gathering fibre interface stats. But why this problem even exist I do not know - mine system has X520 NIC, which is SFP+, but I do have the same log behaviour.

sflow commented 2 years ago

It seems that ethtool must be reporting the presence of that kind of optical module. If you run with debug logging you may see the message from here:

https://github.com/sflow/host-sflow/blob/dc692defd23473a16a388d9123cc4a292889ccef/src/Linux/readInterfaces.c#L475

To run with debug output: sudo systemctl stop hsflowd sudo hsflowd -dd

(Add more ‘d’s for more debug detail)

You could probably run ethtool at the command line to see the same info (not sure what the ethtool option is). Let me know if tou see something we can fix in hsflowd.

g00g1 commented 2 years ago
hsflowd[1272314]: ETHTOOL_GMODULEINFO enp43s0f1 succeeded eeprom_len = 512 eeprom_type=2
hsflowd[1272314]: dbg1: setAdaptorSpeed(ETHTOOL_GLINKSETTINGS2): enp43s0f1 ifSpeed == 10000000000 (changed=YES)
hsflowd[1272314]: detected 64-bit counters - turn off faster polling
hsflowd[1272314]: dbg1: agentAddressChanged=NO
hsflowd[1272314]: dbg1: syncOutputFile
hsflowd[1272314]: SFF8036 ethtool ioctl failed: No such device
hsflowd[1272314]: dbg1: device enp43s0f0 Get SIOCGIFADDR failed : Cannot assign requested address
hsflowd[1272314]: dbg1: setAdaptorSpeed(ETHTOOL_GLINKSETTINGS1): enp43s0f0 ifSpeed == 0 (changed=NO)
hsflowd[1272314]: dbg1: setAdaptorSpeed(ETHTOOL_GLINKSETTINGS2): enp43s0f1 ifSpeed == 10000000000 (changed=NO)

Yes, it does. And it is really optical module, but I do not understand, why ioctl fails with "no such device" error.

sflow commented 2 years ago

What do you get for this Linux command?

% sudo ethtool -m enp43s0f1

I wonder if the problem is just that hsflowd needs to retain root permissions for these requests? You could try running with:

% sudo hsflowd -dd -p

The -p option prevents hsflowd from dropping root privileges.

(You could also run ethtool without sudo to see if it fails)

g00g1 commented 2 years ago
~ sudo ethtool -m enp43s0f1
    Identifier                                : 0x03 (SFP)
    Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
    Connector                                 : 0x07 (LC)
    Transceiver codes                         : 0x10 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
    Transceiver type                          : 10G Ethernet: 10G Base-SR
    Encoding                                  : 0x06 (64B/66B)
    BR, Nominal                               : 10300MBd
    Rate identifier                           : 0x00 (unspecified)
    Length (SMF,km)                           : 0km
    Length (SMF)                              : 0m
    Length (50um)                             : 80m
    Length (62.5um)                           : 30m
    Length (Copper)                           : 0m
    Length (OM3)                              : 300m
    Laser wavelength                          : 850nm
    Vendor name                               : FINISAR CORP.
    Vendor OUI                                : 00:90:65
    Vendor PN                                 : FTLX8571D3BCL
    Vendor rev                                : A
    Option values                             : 0x00 0x1a
    Option                                    : RX_LOS implemented
    Option                                    : TX_FAULT implemented
    Option                                    : TX_DISABLE implemented
    BR margin, max                            : 0%
    BR margin, min                            : 0%
    Vendor SN                                 : ALD09R1
    Date code                                 : 110920
    Optical diagnostics support               : Yes
    Laser bias current                        : 7.936 mA
    Laser output power                        : 0.6249 mW / -2.04 dBm
    Receiver signal average optical power     : 0.4929 mW / -3.07 dBm
    Module temperature                        : 42.05 degrees C / 107.70 degrees F
    Module voltage                            : 3.3301 V
    Alarm/warning flags implemented           : Yes
    Laser bias current high alarm             : Off
    Laser bias current low alarm              : Off
    Laser bias current high warning           : Off
    Laser bias current low warning            : Off
    Laser output power high alarm             : Off
    Laser output power low alarm              : Off
    Laser output power high warning           : Off
    Laser output power low warning            : Off
    Module temperature high alarm             : Off
    Module temperature low alarm              : Off
    Module temperature high warning           : Off
    Module temperature low warning            : Off
    Module voltage high alarm                 : Off
    Module voltage low alarm                  : Off
    Module voltage high warning               : Off
    Module voltage low warning                : Off
    Laser rx power high alarm                 : Off
    Laser rx power low alarm                  : Off
    Laser rx power high warning               : Off
    Laser rx power low warning                : Off
    Laser bias current high alarm threshold   : 11.800 mA
    Laser bias current low alarm threshold    : 4.000 mA
    Laser bias current high warning threshold : 10.800 mA
    Laser bias current low warning threshold  : 5.000 mA
    Laser output power high alarm threshold   : 0.8318 mW / -0.80 dBm
    Laser output power low alarm threshold    : 0.2512 mW / -6.00 dBm
    Laser output power high warning threshold : 0.6607 mW / -1.80 dBm
    Laser output power low warning threshold  : 0.3162 mW / -5.00 dBm
    Module temperature high alarm threshold   : 78.00 degrees C / 172.40 degrees F
    Module temperature low alarm threshold    : -13.00 degrees C / 8.60 degrees F
    Module temperature high warning threshold : 73.00 degrees C / 163.40 degrees F
    Module temperature low warning threshold  : -8.00 degrees C / 17.60 degrees F
    Module voltage high alarm threshold       : 3.7000 V
    Module voltage low alarm threshold        : 2.9000 V
    Module voltage high warning threshold     : 3.6000 V
    Module voltage low warning threshold      : 3.0000 V
    Laser rx power high alarm threshold       : 1.0000 mW / 0.00 dBm
    Laser rx power low alarm threshold        : 0.0100 mW / -20.00 dBm
    Laser rx power high warning threshold     : 0.7943 mW / -1.00 dBm
    Laser rx power low warning threshold      : 0.0158 mW / -18.01 dBm

You are right, when running ethtool from nobody (as it does for hsflowd), I get following error: Cannot get module EEPROM information: Operation not permitted

sflow commented 2 years ago

So the immediate workaround is probably to edit /lib/systemd/system/hsflowd.service and add "-p" to the ExecStart line.

Please confirm that this works, and that you start to see those optical counters in the sflow output (e.g. run sflowtool and grep for "sfp_*").

Looking ahead, we could either (1) keep the socket file descriptor that was opened the first time readInterfaces() was called (when root privileges have not been dropped yet) and use it again for all subsequent ethtool ioctl calls, or (2) call retainRootRequest() the first time we detect the presence of an optical module, so that permissions are not relinquished.

Neither of those are particularly tempting, but first we should check the ethtool sources to see if (1) is likely to work, and see which capability is actually needed for the optical module ioctl.

g00g1 commented 2 years ago

After running sflowtool | grep --line-buffered sfp for 37 minutes I cannot confirm there are any mentions of sfp, but ioctl alerts seem to go away from the logs. Besides that, sflowtool without grep works fine and samples are seen in the console.