oshi / oshi

Native Operating System and Hardware Information
https://oshi.ooo
MIT License
4.78k stars 879 forks source link

HardwareAbstractionLayer.getProcessor() method takes 20 minutes to run #416

Closed lpapazow closed 6 years ago

lpapazow commented 7 years ago

Hi, I am trying to contribute to openhab repo and one of the bindings uses your library.

The long query occurs when I run this code snippet: System.out.println("About to call hal.getProcessor method at: " + LocalDateTime.now()); cpu = hal.getProcessor(); System.out.println("Called hal.getProcessor method at: " + LocalDateTime.now());

This is the log that the code produces:

About to call hal.getProcessor method at: 2017-09-28T11:00:02.150 11:00:02.478 [main] DEBUG o.h.common.AbstractCentralProcessor - Oracle MXBean detected. 11:00:02.626 [main] DEBUG oshi.util.platform.windows.WmiUtil - Connected to ROOT\CIMV2 WMI namespace 11:00:02.627 [main] DEBUG oshi.util.platform.windows.WmiUtil - Query: SELECT ProcessorID FROM Win32_Processor 11:00:02.756 [main] DEBUG oshi.util.platform.windows.WmiUtil - Connected to ROOT\CIMV2 WMI namespace 11:00:02.756 [main] DEBUG oshi.util.platform.windows.WmiUtil - Query: SELECT Name,PercentIdleTime,PercentPrivilegedTime,PercentUserTime,PercentInterruptTime,PercentDPCTime FROM Win32_PerfRawData_Counters_ProcessorInformation WHERE NOT Name LIKE "%_Total" 11:16:47.825 [main] DEBUG oshi.util.platform.windows.WmiUtil - Connected to ROOT\CIMV2 WMI namespace 11:16:47.825 [main] DEBUG oshi.util.platform.windows.WmiUtil - Query: SELECT PercentInterruptTime,PercentDPCTime FROM Win32_PerfRawData_Counters_ProcessorInformation WHERE Name="_Total" 11:20:47.833 [main] DEBUG o.h.p.w.WindowsCentralProcessor - Initialized Processor Called hal.getProcessor method at: 2017-09-28T11:20:47.834

My OS is Windows 10, my CPU is Intel(R) Core(TM) i7-6600U.

dbwiddis commented 7 years ago

Well, WMI queries sometimes take a longer time for the first query, but this is insane.

Have you tried these queries directly from powershell or the command line?

We've had other small issues with WMI delays and I'm wondering if we should just add a timeout to the queries...

lpapazow commented 7 years ago

I tried to execute the following query: PS C:\Users\lyubomir.papazov> $query = "SELECT Name,PercentIdleTime,PercentPrivilegedTime,PercentUserTime,PercentInterru ptTime,PercentDPCTime FROM Win32_PerfRawData_Counters_ProcessorInformation" PS C:\Users\lyubomir.papazov> Get-WmiObject -Query $query

After approximately 10 minutes it produced this error:

Get-WmiObject : Invalid query "SELECT Name,PercentIdleTime,PercentPrivilegedTime,PercentUserTime,PercentInterruptTime,P ercentDPCTime FROM Win32_PerfRawData_Counters_ProcessorInformation" At line:1 char:1 + Get-WmiObject -Query $query + ~~~~~~~

I am not very familiar with WMI but I feel like the query should be correct since executing the query: $query = "SELECT Day FROM Win32_CurrentTime" with Get-WmiObject works fine.

hazendaz commented 7 years ago

Not sure if related but the latest build from master is also extremely slow. It's hanging on operating system test and system info test. It's been well over 5 minutes. Its basically stuck. I'm running in windows 10.

markozajc commented 7 years ago

When I first launched the test program and saw this post while I was waiting for hal.getProcessor(), I thought it's over. It took 55732 miliseconds, which is kinda good (judging by this issue's name), but is there a faster way to get CPU info? I've been using SIGAR before (and now I can't anymore because uses JNI), and it did this kind of stuff in an instant.

dbwiddis commented 7 years ago

I can't help but wonder if something changed in a recent Windows 10 build that caused all these issues in WMI.

Prior to switching to WMI I had been using the Performance Data Helper classes, which I think are the ultimate source that WMI gets their data from. The change was made in 2.6.2 over a year ago (see #199).

I can go back to the PDH method, but it'd be nice to test 2.6.1 (PDH) vs. 2.6.2 (WMI) to isolate that as the problem before I invest the effort.

dbwiddis commented 7 years ago

Other users report this may be fixed in version 3.4.4, possibly due to some change in JNA 4.5.0. Can you test to see if this is resolved?

lpapazow commented 7 years ago

I updated oshi-core to version 3.4.4 and jna to version 4.5.0, but unfortunately, the problem is still not resolved.

YoshiEnVerde commented 7 years ago

I ran the query on PS to check its validity, and it works. Sadly, I'm on Win7, so my run times (less than a minute) aren't useful for this.

However, if even parsing the query to find out it's invalid took over 10 minutes, the problem is most probably on your machine, and not on the OSHI code.

Could you try to run the query once more from PS?

vvmar commented 7 years ago

I can confirm slow access on Windows 10/Lenovo laptop. Here is a stacktrace ----- Time for compter system check: 174 milliseconds2017-10-23 21:27:37.440 DEBUG --- [ main] SystemInfoTest : Checking Processor...

Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz 2 physical CPU(s) 4 logical CPU(s) Identifier: Intel64 Family 6 Model 58 Stepping 9 ProcessorID: BFEBFBFF000306A9 ----- Time for processor check: 480670 milliseconds 2017-10-23 21:35:38.120 DEBUG --- [ main] SystemInfoTest : Checking Memory... Memory: 3.5 GiB/15.8 GiB Swap used: 0 bytes/22 GiB ----- Time for memory check: 480082 milliseconds 2017-10-23 21:43:38.221 DEBUG --- [ main] SystemInfoTest : Checking CPU... Uptime: 6 days, 03:38:15 CPU, IOWait, and IRQ ticks @ 0 sec:[318640390, 0, 55624219, 1231142484, 0, 0, 0, 0] CPU, IOWait, and IRQ ticks @ 1 sec:[318915781, 0, 55669797, 1231785578, 0, 0, 0, 0] User: 28.6% Nice: 0.0% System: 4.7% Idle: 66.7% IOwait: 0.0% IRQ: 0.0% SoftIRQ: 0.0% Steal: 0.0% CPU load: 33.3% (counting ticks) CPU load: 34.3% (OS MXBean) CPU load averages: N/A N/A N/A CPU load per processor: 0.0% 0.0% 0.0% 0.0% 2017-10-23 21:59:39.330 DEBUG --- [ main] SystemInfoTest : Checking Processes... ----- Time for CPU check: 961098 milliseconds Processes: 282, Threads: 3458 PID %CPU %MEM VSZ RSS Name 0 100.0 0.0 64 KiB 8 KiB System Idle Process 18036 17.7 12.9 5.2 GiB 2.0 GiB firefox.exe 5480 12.4 1.4 2.0 TiB 224.3 MiB chrome.exe 13904 5.6 0.1 2.0 TiB 19.0 MiB audiodg.exe 4 2.9 0.0 10.5 MiB 6.7 MiB System ----- Time for processes check: 66760 milliseconds 2017-10-23 22:00:46.102 DEBUG --- [ main] SystemInfoTest : Checking Sensors... Sensors: 2017-10-23 22:00:46.123 ERROR --- [ main] oshi.util.platform.windows.WmiUtil : Could not connect to namespace root\OpenHardwareMonitor. Error code = 0x8004100e CPU Temperature: 0.0DegC 2017-10-23 22:04:46.247 ERROR --- [ main] oshi.util.platform.windows.WmiUtil : Could not connect to namespace root\OpenHardwareMonitor. Error code = 0x8004100e Fan Speeds: [0] 2017-10-23 22:04:46.285 ERROR --- [ main] oshi.util.platform.windows.WmiUtil : Could not connect to namespace root\OpenHardwareMonitor. Error code = 0x8004100e CPU Voltage: 0.0V 2017-10-23 22:04:46.328 DEBUG --- [ main] SystemInfoTest : Checking Power sources... ----- Time for sensors check: 240216 milliseconds Power: 1:31 remaining System Battery @ 91.4% 2017-10-23 22:04:46.345 DEBUG --- [ main] SystemInfoTest : Checking Disks... ----- Time for power sources check: 15 milliseconds Disks: \.\PHYSICALDRIVE0: (model: TS256GSSD320 (Standard disk drives) - S/N: A3227008315070450158) size: 256.1 GB, reads: ? (?), writes: ? (?), xfer: ? ms |-- Disk #0, Partition #0: GPT: Unknown (GPT: Unknown) Maj:Min=0:0, size: 1.0 GB |-- Disk #0, Partition #1: GPT: System (GPT: System) Maj:Min=0:1, size: 272.6 MB |-- Disk #0, Partition #2: GPT: Basic Data (GPT: Basic Data) Maj:Min=0:2, size: 253.4 GB @ C:\ |-- Disk #0, Partition #3: GPT: Unknown (GPT: Unknown) Maj:Min=0:3, size: 831.5 MB |-- Disk #0, Partition #4: GPT: Unknown (GPT: Unknown) Maj:Min=0:4, size: 367.0 MB \.\PHYSICALDRIVE1: (model: SanDisk SDSSDXP480G (Standard disk drives) - S/N: 134726400259) size: 480.1 GB, reads: ? (?), writes: ? (?), xfer: ? ms |-- Disk #1, Partition #0: GPT: Basic Data (GPT: Basic Data) Maj:Min=1:0, size: 480.0 GB @ D:\ ----- Time for disks check: 240254 milliseconds 2017-10-23 22:08:46.610 DEBUG --- [ main] SystemInfoTest : Checking File System... File System: File Descriptors: 0/0 ----- Time for file system check: 3 milliseconds 2017-10-23 22:08:46.623 DEBUG --- [ main] SystemInfoTest : Checking Network interfaces... Network interfaces: Name: wlan2 (Microsoft Hosted Network Virtual Adapter) MAC Address: 26:77:03:d2:fd:98 MTU: 1500, Speed: 0 bps IPv4: [] IPv6: [fe80:0:0:0:f5f4:32ba:17eb:b166] Traffic: received ?/?; transmitted ?/? Name: wlan3 (Intel(R) Centrino(R) Ultimate-N 6300 AGN) MAC Address: 24:77:03:d2:fd:98 MTU: 1500, Speed: 31.6 Mbps IPv4: [192.168.1.3] IPv6: [fe80:0:0:0:68f1:783a:2bdf:b64e] Traffic: received 2883770 packets/246.6 MiB (0 err); transmitted 1526816 packets/33.4 MiB (0 err) Name: eth2 (Intel(R) 82579LM Gigabit Network Connection) MAC Address: 3c:97:0e:4a:45:45 MTU: 1500, Speed: 0 bps IPv4: [] IPv6: [fe80:0:0:0:8557:a87f:8214:316a] Traffic: received ?/?; transmitted ?/? Name: wlan4 (Microsoft Wi-Fi Direct Virtual Adapter) MAC Address: 24:77:03:d2:fd:99 MTU: 1500, Speed: 0 bps IPv4: [] IPv6: [fe80:0:0:0:7458:f3d:1327:3f9b] Traffic: received ?/?; transmitted ?/? Name: eth6 (VirtualBox Host-Only Ethernet Adapter #2) MAC Address: 0a:00:27:00:00:12 MTU: 1500, Speed: 194.7 Mbps IPv4: [192.168.56.1] IPv6: [fe80:0:0:0:d119:9e:ca6a:59d0] Traffic: received ?/?; transmitted ?/? Name: net4 (Microsoft Teredo Tunneling Adapter) MAC Address: 00:00:00:00:00:00:00:e0 MTU: 1472, Speed: 100 Kbps IPv4: [] IPv6: [fe80:0:0:0:0:100:7f:fffe] Traffic: received ?/?; transmitted ?/? 2017-10-23 22:08:47.040 DEBUG --- [ main] SystemInfoTest : Checking Network parameters... ----- Time for network interfaces check: 407 milliseconds Network parameters: Host name: grusilag3 Domain name: grusilag3 DNS servers: [192.168.1.1] IPv4 Gateway: 192.168.1.1 IPv6 Gateway: ----- Time for network parameters check: 228 milliseconds 2017-10-23 22:08:47.269 DEBUG --- [ main] SystemInfoTest : Checking Displays... Displays: Display 0: ----- Time for displays check: 31 milliseconds Manuf. ID=LEN, Product ID=40a1, Analog, Serial=00000000, ManufDate=1/2010, EDID v1.3 31 x 17 cm (12.2 x 6.7 in) Preferred Timing: Clock 102MHz, Active Pixels 2560x2880 Preferred Timing: Clock 101MHz, Active Pixels 2560x2880 Manufacturer Data: 0000000F00A90932A909321909004CA34B54 Unspecified Text: LTN140KT03401 2017-10-23 22:08:47.310 DEBUG --- [ main] SystemInfoTest : Checking USB Devices... USB Devices: Intel(R) 7 Series/C216 Chipset Family USB Enhanced Host Controller - 1E2D (Intel) |-- USB Root Hub ((Standard USB Host Controller)) |-- Generic USB Hub ((Generic USB Hub)) |-- ThinkPad Bluetooth 4.0 (Broadcom) |-- TouchChip Fingerprint Coprocessor (WBF advanced mode) (AuthenTec) Intel(R) 7 Series/C216 Chipset Family USB Enhanced Host Controller - 1E26 (Intel) |-- USB Root Hub ((Standard USB Host Controller)) |-- Generic USB Hub ((Generic USB Hub)) Intel(R) USB 3.0 eXtensible Host Controller - 1.0 (Microsoft) (Generic USB xHCI Host Controller) |-- USB Root Hub (USB 3.0) ((Standard USB HUBs)) |-- USB Composite Device ((Standard USB Host Controller)) |-- Microsoft LifeChat LX-3000 ((Generic USB Audio)) |-- Headset Earphone (Microsoft LifeChat LX-3000 ) (Microsoft) |-- Headset Microphone (Microsoft LifeChat LX-3000 ) (Microsoft) |-- USB Input Device ((Standard system devices)) |-- HID-compliant consumer control device (Microsoft) ----- Time for USB devices check: 902 milliseconds

vvmar commented 7 years ago

sigar 1.6.7 (old version) works pretty snappy on java 8 but crashes on java 9 both running on windows 64 bit. Haven't checked on linux yet.

vvmar commented 7 years ago

It's worth to mention that the first ohsi run on windows (maybe even first few runs) didn't experience such slowdowns. So I didn't get a chance to add timing checks. I am running stable version 3.4.4

A side note. Sigar is running fine on oracle 64bit java on linux.

dbwiddis commented 7 years ago

I just ran it on my own Windows 10 system:

2017-10-25 14:55:36.494 [INFO] SystemInfoTest - Initializing System...
Microsoft Windows 10 build 14393
<snip>
2017-10-25 14:55:54.636 [INFO] SystemInfoTest - Checking USB Devices...

18 seconds total start to finish. Compared to:

----- Time for processor check: 480670 milliseconds
----- Time for memory check: 480082 milliseconds
----- Time for CPU check: 961098 milliseconds
----- Time for processes check: 66760 milliseconds
----- Time for sensors check: 240216 milliseconds
----- Time for power sources check: 15 milliseconds
----- Time for disks check: 240254 milliseconds
----- Time for file system check: 3 milliseconds
----- Time for network interfaces check: 407 milliseconds
----- Time for network parameters check: 228 milliseconds
----- Time for displays check: 31 milliseconds
----- Time for USB devices check: 902 milliseconds

There's clearly a problem with WMI on this system. It's not any single query: all of OSHI's procedures which call WMI are timing out: see the 240 seconds, 480 seconds, and 960 seconds based on how many queries are in the call and how many times they repeat (CPU check repeats the tick queries twice with a 1000ms delay between).

The only way we can fix this in OSHI is to completely remove dependence on WMI. However, I have to think there's some sort of configuration issue going on here... firewall? Accessing over a network?

vvmar commented 7 years ago

I don't think firewall is an issue. Firewall issues are usually binary: either you get or don't get access . In any case, first few runs weren't that slow. System configuration hasn't changed since unless access to WMI makes it spin.

dbwiddis commented 7 years ago

You might try running the WMI Diagnostics Utility

vvmar commented 7 years ago

Didn't produce any results. Maybe it takes awhile as well. However, when I killed all "WMI Provider Host" processes, I got the following times before it got stuck again.

----- Time for compter system check: 286 milliseconds ----- Time for processor check: 4779 milliseconds Swap used: 5.6 GiB/26.4 GiB----- Time for memory check: 157 milliseconds ----- Time for CPU check: 1318 milliseconds ----- Time for processes check: 173459 milliseconds <--- It got stuck here with some 20% cpu usage ----- Time for sensors check: 1736 milliseconds ----- Time for power sources check: 13 milliseconds ----- Time for disks check: 917 milliseconds ----- Time for file system check: 2 milliseconds ----- Time for network interfaces check: 1474 milliseconds ----- Time for network parameters check: 961 milliseconds ----- Time for displays check: 34 milliseconds ----- Time for USB devices check: 2435 milliseconds

vvmar commented 7 years ago

The test run leaves 3 "WMI Provider Host" processes running. When I kill them off and comment check for processes, things don't look so gloom ----- Time for memory check: 224 milliseconds ----- Time for CPU check: 1178 milliseconds ----- Time for sensors check: 196 milliseconds ----- Time for power sources check: 13 milliseconds ----- Time for disks check: 325 milliseconds ----- Time for file system check: 2 milliseconds ----- Time for network interfaces check: 509 milliseconds ----- Time for network parameters check: 269 milliseconds ----- Time for displays check: 35 milliseconds ----- Time for USB devices check: 1196 milliseconds

vvmar commented 7 years ago

So as long as I keep "processes" check commented out, I get repeatable (give or take) performance ----- Time for compter system check: 226 milliseconds ----- Time for processor check: 1089 milliseconds 2 physical CPU(s) ----- Time for memory check: 84 milliseconds ----- Time for CPU check: 1153 milliseconds ----- Time for sensors check: 148 milliseconds ----- Time for power sources check: 11 milliseconds ----- Time for disks check: 261 milliseconds ----- Time for file system check: 2 milliseconds ----- Time for network interfaces check: 392 milliseconds IPv6 Gateway: ----- Time for network parameters check: 281 milliseconds ----- Time for displays check: 27 milliseconds ----- Time for USB devices check: 977 milliseconds

That should provide some info for a possible workaround for Window 10 users . Hopefully, linux numbers will be smaller and overall operation more stable.

YoshiEnVerde commented 7 years ago

Anybody tried checking the WMI logs to see if anything popped up there? It seems there's been a lot of issues with WMI in Win 10...

dbwiddis commented 7 years ago

Hmmm.... I wonder if the error is in using the methods in the process class to get the user info. Any chance you can try editing out the PROCESS_GETOWNER and PROCESS_GETOWNERSID lines from WindowsOperatingSystem? (or perhaps comment out their effects and return an empty string from WmiUtil)

vvmar commented 7 years ago

back to taking a long time for a processor check Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz ----- Time for processor check: 480768 milliseconds Unsure what the cause is. But this obviously makes it not very usable on windows 10 at the moment.

dbwiddis commented 7 years ago

Can you please try wmi queries outside of OSHI to see if this is the problem? It is working fine for me on Windows 10. There may be some setup you have that's different.

vvmar commented 7 years ago

Would you mind listing all queries you want me to run?

dbwiddis commented 7 years ago

Turn on debug logging for Oshi. Run the queries that take a long time in OSHI. :)

It is possible that there is a resource leak that's causing the problem... if you can help identify any threads lingering around that would be helpful. My suspicion may be something with the methods: OSHI's process listing calls the GetOwner and GetOwnerSid procedures. If you can narrow it down to a bug associated with those, that'd be helpful.

vvmar commented 7 years ago

Both wbemtest tool and the code take a long time to run this SELECT Name,PercentIdleTime,PercentPrivilegedTime,PercentUserTime,PercentInterruptTime,PercentDPCTime FROM Win32_PerfRawData_Counters_ProcessorInformation WHERE NOT Name LIKE "%_Total"

If calls get queued up then there could be one rouge process holding up the queue. Killing all processes makes it run faster again.

Didn't see any side notes on usage in the manual http://wutils.com/wmi/root/cimv2/win32_perfrawdata_counters_processorinformation/

vvmar commented 7 years ago

wbemtest tool just returned results marking it as invalid query. I have tons of processes running on the system and this query seems like a scan. I don't have enough bandwidth to dive deeper into this at the moment. Hope this is enough info to test additional scenarious.

vvmar commented 7 years ago

Launching a process for a quick systems check might not be a scalable approach in any case given that for a typical monitoring job one is expected to run those queries as frequently as one or more per second.

dbwiddis commented 6 years ago

Consolidating similar issues in #472