ytrstu / i7z

Automatically exported from code.google.com/p/i7z
GNU General Public License v2.0
0 stars 0 forks source link

Program crashes with rsmsr:pread: Input/Output error #67

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Step 1. What steps will reproduce the problem?

Running i7z on my X7550 HP DL580 G7

--[62] Processor number 62
--[62] Socket number/Hyperthreaded Sibling number  3,30
--[62] Core id number 10
--[62] Display core in i7z Tool: No

--[63] Processor number 63
--[63] Socket number/Hyperthreaded Sibling number  3,31
--[63] Core id number 11
--[63] Display core in i7z Tool: No

Socket-0 [num of cpus 8 physical 8 logical 16] 0,1,2,3,4,5,6,7,
Socket-1 [num of cpus 8 physical 8 logical 16] 8,9,10,11,12,13,14,15,
GUI has been Turned ON
Logging is OFF
i7z DEBUG: Dual Socket Detected
i7z DEBUG: In i7z Dual_Socket()
Cpu speed from cpuinfo 1997.00Mhz
True Frequency (without accounting Turbo) 1997 MHz
rdmsr:pread: Input/output error
Quitting i7z

Step 2. What version of the product are you using (the download version or
the svn
version and which one)?

Download, i7z-0.27.2

Step 3. If this is an enhancement or suggestion, skip this step (except if
its related to a particular OS or architecture). On what operating system?
BTW is it a 32-bit or a 64-bit OS/Kernel?

Not sure if it's related, so I'll report--

Debian Squeeze
Linux srv04 2.6.32-5-amd64 #1 SMP Mon Feb 25 00:26:11 UTC 2013 x86_64 GNU/Linux

Step 4. Do provide information if bug/enhancement is related to a
particular OS/Processor. Please provide any additional information below
(helpful will be platform
information like number and types of cpus, motherboard and a copy of
/proc/cpuinfo)

Quad X7550 CPU
cpu family      : 6
model           : 46
model name      : Intel(R) Xeon(R) CPU           X7550  @ 2.00GHz
stepping        : 6
cpu MHz         : 1997.542
cache size      : 18432 KB
physical id     : 0
siblings        : 16
core id         : 0
cpu cores       : 8
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm 
constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf 
pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 
x2apic popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3995.08
clflush size    : 64
cache_alignment : 64
address sizes   : 44 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 46
model name      : Intel(R) Xeon(R) CPU           X7550  @ 2.00GHz
stepping        : 6
cpu MHz         : 1997.542
cache size      : 18432 KB
physical id     : 0
siblings        : 16
core id         : 1
cpu cores       : 8
apicid          : 2
initial apicid  : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm 
constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf 
pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 
x2apic popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3995.45
clflush size    : 64
cache_alignment : 64
address sizes   : 44 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 46
model name      : Intel(R) Xeon(R) CPU           X7550  @ 2.00GHz
stepping        : 6
cpu MHz         : 1997.542
cache size      : 18432 KB
physical id     : 0
siblings        : 16
core id         : 2
cpu cores       : 8
apicid          : 4
initial apicid  : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm 
constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf 
pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 
x2apic popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3995.45
clflush size    : 64
cache_alignment : 64
address sizes   : 44 bits physical, 48 bits virtual
power management:

<<snipped>>

processor       : 62
vendor_id       : GenuineIntel
cpu family      : 6
model           : 46
model name      : Intel(R) Xeon(R) CPU           X7550  @ 2.00GHz
stepping        : 6
cpu MHz         : 1997.542
cache size      : 18432 KB
physical id     : 3
siblings        : 16
core id         : 10
cpu cores       : 8
apicid          : 117
initial apicid  : 117
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm 
constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf 
pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 
x2apic popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3995.50
clflush size    : 64
cache_alignment : 64
address sizes   : 44 bits physical, 48 bits virtual
power management:

processor       : 63
vendor_id       : GenuineIntel
cpu family      : 6
model           : 46
model name      : Intel(R) Xeon(R) CPU           X7550  @ 2.00GHz
stepping        : 6
cpu MHz         : 1997.542
cache size      : 18432 KB
physical id     : 3
siblings        : 16
core id         : 11
cpu cores       : 8
apicid          : 119
initial apicid  : 119
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm 
constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf 
pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 
x2apic popcnt lahf_lm ida tpr_shadow vnmi flexpriority ept vpid
bogomips        : 3995.54
clflush size    : 64
cache_alignment : 64
address sizes   : 44 bits physical, 48 bits virtual
power management:

Step 5. If the program crashes. Can you also paste the
output of ls -lt /dev/cpu/*/msr?

crw------- 1 root root 202, 26 Mar 16 11:44 /dev/cpu/26/msr
crw------- 1 root root 202, 27 Mar 16 11:44 /dev/cpu/27/msr
crw------- 1 root root 202, 28 Mar 16 11:44 /dev/cpu/28/msr
crw------- 1 root root 202, 29 Mar 16 11:44 /dev/cpu/29/msr
crw------- 1 root root 202, 30 Mar 16 11:44 /dev/cpu/30/msr
crw------- 1 root root 202, 31 Mar 16 11:44 /dev/cpu/31/msr
crw------- 1 root root 202, 32 Mar 16 11:44 /dev/cpu/32/msr
crw------- 1 root root 202, 33 Mar 16 11:44 /dev/cpu/33/msr
crw------- 1 root root 202, 34 Mar 16 11:44 /dev/cpu/34/msr
crw------- 1 root root 202, 35 Mar 16 11:44 /dev/cpu/35/msr
crw------- 1 root root 202, 36 Mar 16 11:44 /dev/cpu/36/msr
crw------- 1 root root 202, 37 Mar 16 11:44 /dev/cpu/37/msr
crw------- 1 root root 202, 38 Mar 16 11:44 /dev/cpu/38/msr
crw------- 1 root root 202, 39 Mar 16 11:44 /dev/cpu/39/msr
crw------- 1 root root 202, 40 Mar 16 11:44 /dev/cpu/40/msr
crw------- 1 root root 202, 41 Mar 16 11:44 /dev/cpu/41/msr
crw------- 1 root root 202, 42 Mar 16 11:44 /dev/cpu/42/msr
crw------- 1 root root 202, 43 Mar 16 11:44 /dev/cpu/43/msr
crw------- 1 root root 202, 44 Mar 16 11:44 /dev/cpu/44/msr
crw------- 1 root root 202, 45 Mar 16 11:44 /dev/cpu/45/msr
crw------- 1 root root 202, 46 Mar 16 11:44 /dev/cpu/46/msr
crw------- 1 root root 202, 47 Mar 16 11:44 /dev/cpu/47/msr
crw------- 1 root root 202, 48 Mar 16 11:44 /dev/cpu/48/msr
crw------- 1 root root 202, 49 Mar 16 11:44 /dev/cpu/49/msr
crw------- 1 root root 202, 50 Mar 16 11:44 /dev/cpu/50/msr
crw------- 1 root root 202, 51 Mar 16 11:44 /dev/cpu/51/msr
crw------- 1 root root 202, 52 Mar 16 11:44 /dev/cpu/52/msr
crw------- 1 root root 202, 53 Mar 16 11:44 /dev/cpu/53/msr
crw------- 1 root root 202, 54 Mar 16 11:44 /dev/cpu/54/msr
crw------- 1 root root 202, 55 Mar 16 11:44 /dev/cpu/55/msr
crw------- 1 root root 202, 56 Mar 16 11:44 /dev/cpu/56/msr
crw------- 1 root root 202, 57 Mar 16 11:44 /dev/cpu/57/msr
crw------- 1 root root 202, 58 Mar 16 11:44 /dev/cpu/58/msr
crw------- 1 root root 202, 59 Mar 16 11:44 /dev/cpu/59/msr
crw------- 1 root root 202, 60 Mar 16 11:44 /dev/cpu/60/msr
crw------- 1 root root 202, 61 Mar 16 11:44 /dev/cpu/61/msr
crw------- 1 root root 202, 62 Mar 16 11:44 /dev/cpu/62/msr
crw------- 1 root root 202, 63 Mar 16 11:44 /dev/cpu/63/msr
crw------- 1 root root 202,  0 Mar 16 11:44 /dev/cpu/0/msr
crw------- 1 root root 202, 10 Mar 16 11:44 /dev/cpu/10/msr
crw------- 1 root root 202, 11 Mar 16 11:44 /dev/cpu/11/msr
crw------- 1 root root 202, 12 Mar 16 11:44 /dev/cpu/12/msr
crw------- 1 root root 202, 13 Mar 16 11:44 /dev/cpu/13/msr
crw------- 1 root root 202, 14 Mar 16 11:44 /dev/cpu/14/msr
crw------- 1 root root 202, 15 Mar 16 11:44 /dev/cpu/15/msr
crw------- 1 root root 202, 16 Mar 16 11:44 /dev/cpu/16/msr
crw------- 1 root root 202, 17 Mar 16 11:44 /dev/cpu/17/msr
crw------- 1 root root 202, 18 Mar 16 11:44 /dev/cpu/18/msr
crw------- 1 root root 202, 19 Mar 16 11:44 /dev/cpu/19/msr
crw------- 1 root root 202,  1 Mar 16 11:44 /dev/cpu/1/msr
crw------- 1 root root 202, 20 Mar 16 11:44 /dev/cpu/20/msr
crw------- 1 root root 202, 21 Mar 16 11:44 /dev/cpu/21/msr
crw------- 1 root root 202, 22 Mar 16 11:44 /dev/cpu/22/msr
crw------- 1 root root 202, 23 Mar 16 11:44 /dev/cpu/23/msr
crw------- 1 root root 202, 24 Mar 16 11:44 /dev/cpu/24/msr
crw------- 1 root root 202, 25 Mar 16 11:44 /dev/cpu/25/msr
crw------- 1 root root 202,  2 Mar 16 11:44 /dev/cpu/2/msr
crw------- 1 root root 202,  3 Mar 16 11:44 /dev/cpu/3/msr
crw------- 1 root root 202,  4 Mar 16 11:44 /dev/cpu/4/msr
crw------- 1 root root 202,  5 Mar 16 11:44 /dev/cpu/5/msr
crw------- 1 root root 202,  6 Mar 16 11:44 /dev/cpu/6/msr
crw------- 1 root root 202,  7 Mar 16 11:44 /dev/cpu/7/msr
crw------- 1 root root 202,  8 Mar 16 11:44 /dev/cpu/8/msr
crw------- 1 root root 202,  9 Mar 16 11:44 /dev/cpu/9/msr

Step 6. If the program crashed OR if this is for a new i7 chip. Can you
also run the following commands? They read various registers
and if those registers cannot be read the tool fails ungracefully. You will
have to sudo
each commmand and probably also do sudo modprobe msr (before running the
programs)

rdmsr 0x19c
rdmsr 0x1a2
rdmsr 0x38d
rdmsr 778
rdmsr 779
rdmsr 1020
rdmsr 1021

Results

rdmsr 0x19c
88240000
rdmsr 0x1a2
5b0800
rdmsr 0x38d
330
rdmsr 778
255148deeb
rdmsr 779
22fc3419cf
rdmsr 1020
0
rdmsr 1021
0

Original issue reported on code.google.com by pau...@gmail.com on 16 Mar 2013 at 7:01

GoogleCodeExporter commented 9 years ago
cool computer :)

can you try out the svn/git source ( i think i might have put some hooks in the 
svn source to control if some registers are not readable) ? if the code still 
crashes could you run it in gdb and paste the stack?
sudo gdb i7z 

i am guessing that some of the residency values are not available (msr 
1020,1021) and maybe the code is trying to read it and being not available, its 
crashing.

thanks

Original comment by abhirana on 20 Mar 2013 at 1:52

GoogleCodeExporter commented 9 years ago
First, thanks for the compliment, would be even better if we could get the 
CPU's to hit their turbo boost of 2.4GHz instead of stalling out at 2.13GHz. 
Also, thanks for the quick reply back.

I grabbed the svn version, compiled and installed it, and it's still failing 
with the same message:
Socket-0 [num of cpus 8 physical 8 logical 16] 0,1,2,3,4,5,6,7,
Socket-1 [num of cpus 8 physical 8 logical 16] 8,9,10,11,12,13,14,15,
GUI has been Turned ON
Logging (freq) is OFF
Temperature logging is OFF
Cstate logging is OFF
i7z DEBUG: Dual Socket Detected
i7z DEBUG: In i7z Dual_Socket()

Cpu speed from cpuinfo 1997.00Mhz
True Frequency (without accounting Turbo) 1997 MHz
rdmsr:pread: Input/output error
                               Quitting i7z

When I fire up gdb, it reports that it can't find any debugging symbols:
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/i7z...(no debugging symbols found)...done.

If I run it in here, it gives me the same output, but adds a little extra 
message:
Program exited with code 0177.

Maybe I'm missing another command that you'd like me to execute? Do I need to 
enable a flag to compile it with symbols?

Thanks,
-paul

Original comment by pau...@gmail.com on 20 Mar 2013 at 4:45

GoogleCodeExporter commented 9 years ago
hi paul

thanks.

if you add the -g flag instead of -O3 in the makefile
https://code.google.com/p/i7z/source/browse/trunk/Makefile#10
CFLAGS ?= -g 

and then re-run the program, you should get more output from gdb.

tangentially: have you tried cpufreq-aperf? usually it should show the turbo 
scaling but without any core residency values.

Original comment by abhirana on 20 Mar 2013 at 5:11

GoogleCodeExporter commented 9 years ago
abhirana,

i recompiled with the -g flag, and then re-ran the program, still don't get 
anything from gdb, always says No stack, even after i issue a run command:
....

rdmsr:pread: Input/output error
                               Quitting i7z

Program exited with code 0177.
(gdb) bt
No stack.

I can get cpufreq-aperf to run, and it shows the processors running at a 
boosted speed:
000     2115760                 00 sec 004 ms   00 sec 995 ms   00
001     2115760                 00 sec 001 ms   00 sec 998 ms   00
002     2115760                 00 sec 001 ms   00 sec 998 ms   00

And turbostat will report higher clock speed (2.13GHz -- cpu is 2.0GHz, support 
to turbo boost to 2.4GHz):

 avg   2.13   2.00   0.11  99.89   0.00   0.00   0.00   0.00   0.00
   0   2.13   2.00   0.12  99.88   0.00   0.00   0.00   0.00   0.00

and if i stress the cpu's i can see the speed drop..

really confused as to why i can't get them to hit the max turbo boost speed of 
2.4GHz.

thanks for your help,
-paul

Original comment by pau...@gmail.com on 22 Mar 2013 at 4:54

GoogleCodeExporter commented 9 years ago
hey paul

good the other tools are working for you. i'll fix i7z later.

so i have two questions, 

can you find the exact multipliers possible for your cpu? i see online that it 
can go upto 2.4ghz but usually most processors have two or more turbo speed 
depending on how many number of cores are active (CO or C1). I am not able to 
find the exact multipliers and maybe its the case that the maximum number of 
cores required to be inactive is very high and that is making your machine 
never go into turbo.

if you do 
#each line will give maximum turbo multiplier with X cores active, starting 
with 1 core active
rdmsr 0x1ad --decimal --bitfield 7:0
#maximum turbo multiplier with 2 core active
rdmsr 0x1ad --decimal --bitfield 15:8
#maximum turbo multiplier with 3 core active
rdmsr 0x1ad --decimal --bitfield 23:16
#maximum turbo multiplier with 4 core active
rdmsr 0x1ad --decimal --bitfield 31:24
#maximum turbo multiplier with 5 core active
rdmsr 0x1ad --decimal --bitfield 39:32
#maximum turbo multiplier with 6 core active
rdmsr 0x1ad --decimal --bitfield 47:40

in the turbostat output.
can you make sure that the other cores are going into deeper sleep > C1, if not 
then it may be the cause of not going into higher turbo

Original comment by abhirana on 22 Mar 2013 at 5:19