netb2c / byte-unixbench

Automatically exported from code.google.com/p/byte-unixbench
0 stars 0 forks source link

unreasonable multiple Whetstone results #15

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. ./Run arthrimetic
2. ./Run -c 4 arthrimetic
3. ./Run -c 8 arthrimetic
4. ./Run -c 12 arthrimetic
5. ./Run -c 16 arthrimetic

What is the expected output? What do you see instead?

the system CPU has only 4 cores.

-c 4 ~ 16 should have result slightly less than 4x of single process test 
result.
but, the results are:
4: 3380.9       
8: 6756.2   
12: 10130.1 
16: 13504.2

What version of the product are you using? On what operating system?
unixbench 5.1.3
3.2.0-4-amd64 #1 SMP Debian 3.2.63-2 x86_64 GNU/Linux
one Intel XEON E3-1225

Please provide any additional information below.
also got same kind of result on a Apple MBP, with Core i7 3615, 4C8T

1: 959.8    
4: 3902.9   
8: 7139.1
12: 10673.6
16: 14252.6

Original issue reported on code.google.com by kensenj...@gmail.com on 5 Dec 2014 at 2:38

GoogleCodeExporter commented 8 years ago
sorry, the title should be "unreasonable multiple-core Whetstone results"

Original comment by kensenj...@gmail.com on 5 Dec 2014 at 2:40

GoogleCodeExporter commented 8 years ago
I assume this has got to do with how the measurement is done. Just raised the 
same issue. I changed the flags passed to compiler and it fixed the problem. 
Could you kindly try the same?

Instead of -DUNIX , kindly use -DGTODay so that it uses wall clock time instead 
of process cpu time.

Thanks.

Original comment by r.puvich...@gmail.com on 30 Mar 2015 at 1:11

GoogleCodeExporter commented 8 years ago
tested in OS X, 4C8T, (not idle). looks it still uses process cpu time. 
1: 1004.1
2: 1951.8
4: 3887.8
8: 6794.8

Original comment by kensenj...@gmail.com on 30 Mar 2015 at 2:41

GoogleCodeExporter commented 8 years ago
I tried now and I see it is working fine.

DUNIX:

Benchmark Run: Mon Mar 30 2015 21:48:04 - 21:48:19
4 CPUs in system; running 1 parallel copy of tests

Double-Precision Whetstone                     4203.4 MWIPS (9.7 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Double-Precision Whetstone                       55.0       4203.4    764.3
                                                                   ========
System Benchmarks Index Score (Partial Only)                          764.3

------------------------------------------------------------------------
Benchmark Run: Mon Mar 30 2015 21:48:19 - 21:48:36
4 CPUs in system; running 4 parallel copies of tests

Double-Precision Whetstone                    13725.4 MWIPS (9.9 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Double-Precision Whetstone                       55.0      13725.4   2495.5
                                                                   ========
System Benchmarks Index Score (Partial Only)                         2495.5

------------------------------------------------------------------------
Benchmark Run: Mon Mar 30 2015 21:48:36 - 21:49:05
4 CPUs in system; running 8 parallel copies of tests

Double-Precision Whetstone                    27433.0 MWIPS (9.9 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Double-Precision Whetstone                       55.0      27433.0   4987.8
                                                                   ========
System Benchmarks Index Score (Partial Only)                         4987.8

DGTODay:

Benchmark Run: Mon Mar 30 2015 21:51:08 - 21:51:24
4 CPUs in system; running 1 parallel copy of tests

Double-Precision Whetstone                     4293.7 MWIPS (10.0 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Double-Precision Whetstone                       55.0       4293.7    780.7
                                                                   ========
System Benchmarks Index Score (Partial Only)                          780.7

------------------------------------------------------------------------
Benchmark Run: Mon Mar 30 2015 21:51:24 - 21:51:40
4 CPUs in system; running 4 parallel copies of tests

Double-Precision Whetstone                    13406.6 MWIPS (9.9 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Double-Precision Whetstone                       55.0      13406.6   2437.6
                                                                   ========
System Benchmarks Index Score (Partial Only)                         2437.6

------------------------------------------------------------------------
Benchmark Run: Mon Mar 30 2015 21:51:40 - 21:51:55
4 CPUs in system; running 8 parallel copies of tests

Double-Precision Whetstone                    14167.5 MWIPS (9.3 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Double-Precision Whetstone                       55.0      14167.5   2575.9
                                                                   ========
System Benchmarks Index Score (Partial Only)                         2575.9

Original comment by r.puvich...@gmail.com on 30 Mar 2015 at 4:23

GoogleCodeExporter commented 8 years ago
Since you mentioned 8 threads, could you try with 16 copies and with both the 
timing options to see how the numbers look like?

Original comment by r.puvich...@gmail.com on 30 Mar 2015 at 4:23

GoogleCodeExporter commented 8 years ago
Forgot to mention additional information. My laptop had 2C4T.. so.. 4T should 
be the max for any throughput scaling..

Original comment by r.puvich...@gmail.com on 30 Mar 2015 at 4:25

GoogleCodeExporter commented 8 years ago
This works for me. I made tests using Arch Linux on:

Intel(R) Xeon(R) CPU W3530 (4 cores, HT)
ARM Cortex A8 (one core)
ARM Cortex A7 (dual core)

I don't have any Mac to test, maybe you may try -DMAC ? You can see the options 
at src/whets.c at "Timer options".

Thanks.

Original comment by jefferso...@gmail.com on 30 Mar 2015 at 4:30

GoogleCodeExporter commented 8 years ago
Mac OS also should support gettimeofday( ). Could you kindly try increasing the 
number of copies to be beyond the number of cpu threads and test the same?

Original comment by r.puvich...@gmail.com on 30 Mar 2015 at 5:01

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
tried -DMAC, and -DMAC_TMgr

-DMAC_TMgr won't even compile, "Timer.h"  not found, maybe that's for older PPC 
Macs.

result of -DMAC on 4C8T OS X (problem seems not solved, and new problem 
introduced.)

Original comment by kensenj...@gmail.com on 31 Mar 2015 at 4:50

Attachments:

GoogleCodeExporter commented 8 years ago
and  tested -DGTODay in 
Linux ... 3.16.0-4-686-pae #1 SMP Debian 3.16.7-ckt4-3 (2015-02-03) i686 
GNU/Linux
(already patched 'volatile unsigned long iter;')
got this error:
1 x Arithoh  1 2 3Can't take log of -2.93709e+07 at ./Run line 935.

is the timing done by Perl script 'Run' ?

Original comment by kensenj...@gmail.com on 31 Mar 2015 at 4:55

GoogleCodeExporter commented 8 years ago
Could you kindly clarify the following? I assume the initial issue reported was 
with whetstone. Did you try -DGTODay with whetstone? What was the results with 
multiple copies ( > 8 copies ).

arith.c uses a different method for measuring time. From your previous results 
arithmetic is working properly.

Original comment by r.puvich...@gmail.com on 31 Mar 2015 at 5:46

GoogleCodeExporter commented 8 years ago
hmm, looks like solved.

though the score of >4 copies is much higher  than what I expected. 

Thanks!

Original comment by kensenj...@gmail.com on 31 Mar 2015 at 9:45

Attachments:

GoogleCodeExporter commented 8 years ago
 -DGTODay also works well in linux. 

Original comment by kensenj...@gmail.com on 1 Apr 2015 at 4:42