namtrac35 / pyrit

Automatically exported from code.google.com/p/pyrit
0 stars 0 forks source link

pyrit-calpp v2 - testing required #148

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I'm attaching new version of CAL++ computing core. It's about 8-9% faster
than the one in svn. 
I have to admit that this is a little strange solution and I'm not sure how
it will behave on other ATI cards. So it needs a little bit of testing :).

Standard pyrit core works in the following way
- prepares data 
- transfers to gpu, 
- gpu runs 
- transfer data from gpu
- postprocess data. 
Simple timing analysis shows that this approach shouldn't waste more than
1-2% of time ( when gpu is not working ). 

The new calpp v2 core is trying to mask transfer/data processing by gpu
computations ( so we start computation and then when gpu is busy we process
data on cpu ). It should be only 1-2% faster. But it isn't :).

First of all the simple analysis doesn't take into account driver/card
behavior. ATI's driver are really sensitive to cpu performance and probably
are hiding some actions.

Now to the interesting part :). The v2 core doesn't do explicit data
transfer at all. Trying to send data to gpu while it was working resulted
in some performance degradation. So now all the data are in host memory and
gpu is taking it directly during computations. Fortunately in the case of
pyrit amount of computations is so huge that transfer from host memory can
be masked by computations from other gpu threads.
This may not be true for 5xxx cards where kernel is smaller and faster.
Also memory<->gpu transfer speed may have some impact.

So please test the v2 core and post here your results.
Also you should try to change line 453 in pyrit/cpyrit/cpyrit.py from
'ncpus-=1' to 'ncpus-=2' ( or 'ncpus-=4' ) and test impact of more
available cpu cores on performance.

The v2 core works with svn version of CAL++ library (it has been attached)

Original issue reported on code.google.com by hazema...@gmail.com on 16 Apr 2010 at 3:44

Attachments:

GoogleCodeExporter commented 9 years ago
pyrit.lover: Yep all the cores ( except v2 ) report peak performance. But even 
for v2
your method of testing is most accurate. 
odlan3: There is not much transfer involved. So for pyrit it's not such a big 
issue.
Whole transfer takes ~1% of time and gpu->cpu is 2/7 of this. You can read more 
about
slow pci transfer problem here -
http://forums.amd.com/devforum/messageview.cfm?catid=328&threadid=130923&enterth
read=y

Original comment by hazema...@gmail.com on 22 Apr 2010 at 7:57

GoogleCodeExporter commented 9 years ago
As a small bonus to all who helped with testing I'm attaching v3 core. 
This one has improved kernel and uses latest svn version of CAL++ library.

Original comment by hazema...@gmail.com on 23 Apr 2010 at 12:47

Attachments:

GoogleCodeExporter commented 9 years ago
Here's the results from my 4850s (oc'd core to 650, mem is stock) on v2c:

Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (38905.1 PMKs/s)... | 

Computed 40537.35 PMKs/s total.
#1: 'CAL++ Device #1 'ATI RV770'': 19610.6 PMKs/s (RTT 2.7)
#2: 'CAL++ Device #2 'ATI RV770'': 19215.3 PMKs/s (RTT 2.9)

Looks like a keeper to me.

@pyrit.lover: I responded to your overclock questions in the google group as 
lukas
requested in comment 94. Don't know if you saw it.

Original comment by robert.b...@gmail.com on 23 Apr 2010 at 12:55

GoogleCodeExporter commented 9 years ago
My 4850s' (650 core, 993 mem) results using calpp-svn-1 and cpyrit_calpp-v3:

Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (39441.6 PMKs/s)... \ 

Computed 41032.62 PMKs/s total.
#1: 'CAL++ Device #1 'ATI RV770'': 20140.0 PMKs/s (RTT 2.7)
#2: 'CAL++ Device #2 'ATI RV770'': 19965.0 PMKs/s (RTT 2.8)

Looks like you managed to squeeze some more juice out of it. Great job!

Original comment by robert.b...@gmail.com on 23 Apr 2010 at 4:10

GoogleCodeExporter commented 9 years ago
@hazeman11: Great job...I am impressed
pyrit benchmark
Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (142081.1 PMKs/s)... - 

Computed 145429.78 PMKs/s total.
#1: 'CAL++ Device #1 'ATI CYPRESS'': 71047.7 PMKs/s (RTT 1.0)
#2: 'CAL++ Device #2 'ATI CYPRESS'': 71352.4 PMKs/s (RTT 1.0)

Original comment by odl...@gmail.com on 23 Apr 2010 at 7:47

GoogleCodeExporter commented 9 years ago
I'm happy that you like it :). This is my way of thanking you all for helping 
with
tests :). I really appreciate it.
I'm only sorry that speedup on 4xxx was so small. It could be ~10% but CAL IL
compiler ( in ATI driver ) has issues. When the pressure on instruction 
scheduling
has been reduced it started to use smaller ALU clauses to save registers - 
which is
totally bad direction for this type of kernel :(.

PS. Could someone with 5xxx attach output of 'pyrit list_cores' with 
uncommented line
152 in _cpyrit_calpp.cpp ?

Original comment by hazema...@gmail.com on 23 Apr 2010 at 12:33

GoogleCodeExporter commented 9 years ago
If there is no need to recompile the code, edit the line 152 from
//self->dev_prog.disassemble(std::cout);
to
self->dev_prog.disassemble(std::cout);
pyrit list_cores
Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

The following cores seem available...
#1:  'CAL++ Device #1 'ATI CYPRESS''
#2:  'CAL++ Device #2 'ATI CYPRESS''
#3:  'CPU-Core (SSE2)'
#4:  'CPU-Core (SSE2)'
#5:  'CPU-Core (SSE2)'
#6:  'CPU-Core (SSE2)'
And with
//self->dev_prog.disassemble(std::cout);
pyrit list_cores
Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

The following cores seem available...
#1:  'CAL++ Device #1 'ATI CYPRESS''
#2:  'CAL++ Device #2 'ATI CYPRESS''
#3:  'CPU-Core (SSE2)'
#4:  'CPU-Core (SSE2)'
#5:  'CPU-Core (SSE2)'
#6:  'CPU-Core (SSE2)'

Original comment by odl...@gmail.com on 23 Apr 2010 at 1:04

GoogleCodeExporter commented 9 years ago
There is need to recompile. And trust me output will be loooong :) ( so please 
post
it as attachment )

Original comment by hazema...@gmail.com on 23 Apr 2010 at 1:08

GoogleCodeExporter commented 9 years ago
Sorry..:-) ...next time i will post output in attachment

Original comment by odl...@gmail.com on 23 Apr 2010 at 1:17

GoogleCodeExporter commented 9 years ago
Ferrari at begin, then elicopter, then concorde, now rocket!

pyrit benchmark say:
HD5770 41891 PMK/s
HD5870 82552 PMK/s

Doing °real° test I got 125072 PMK, it means another +19%. It means +92% 
between 
"before calpp" and "after calpp", tot bad double PMK without double the 
hardware.

what next week? will you make my system as ESS Enterprise with v4? :-D

Anyway I report you about my comment #83 and you answer #85 and #87: I did 
another 
test and YES: if I do other easy activity on another xterm (ls, mv, etc) the 
PMK slow 
down, it is not only a problem relate to wrong show of estimated PMK, but 
really the 
task is completed in more time, the PMK slow down (I verified using "time" 
command), 
the impression is that if one core is disturbed by other activity, then it is 
not able 
to go back to serve PMK and it stay unused till that task finishs and another 
"clean" 
task start.

The "disturbed" task slow down to 91000 PMK, the undisturbed task did 125000 
PMK.
Note that I disturbed the tash about when it already did 50% of job, if I 
disturbe it 
early, the final PMK slould be lower than 91000.

Of course the easy solution is to not disturb the pc when it run pyrit, but I 
gues 
where problem can allign.

Original comment by pyrit.lo...@gmail.com on 23 Apr 2010 at 6:37

GoogleCodeExporter commented 9 years ago
pyrit.lover: To verify what really is the problem you could run debug version (
comment 71 ) and post here output ( undisturbed & disturbed ). 

PS. Could someone with 5xxx attach output of 'pyrit list_cores' with 
uncommented line
152 in _cpyrit_calpp.cpp ( for version v3 ) ?

Original comment by hazema...@gmail.com on 23 Apr 2010 at 7:12

Attachments:

GoogleCodeExporter commented 9 years ago
Of course run debug version from last comment :).

Original comment by hazema...@gmail.com on 23 Apr 2010 at 7:16

GoogleCodeExporter commented 9 years ago
pyrit.lover: I'll make v3-debug sometime tomorrow - maybe it's better to wait 
for it.
Because this v2-2-debug is really something between v2-1 and v2-2 - so I'm not 
sure
if the problem will occur. 

Original comment by hazema...@gmail.com on 23 Apr 2010 at 7:58

GoogleCodeExporter commented 9 years ago
I'm attaching v3-debug version. 

I really would like for someone to post ISA code on 5xxx cards from v3 version (
output of 'pyrit list_cores' with uncommented line 152 in _cpyrit_calpp.cpp ). 
Without looking at this I can't start working on v4 version :).

And btw benchmarking results usually are smaller then true value ( v2 & v3 
cores )
Pyrit benchmarking engine has problem with feeding gpus at the beggining and 
the end
of benchmark. For standard cores this isn't a problem as they measure peak
performance anyway but v2 & v3 estimate sustained speed.
New cores could measure peak values, but it would require changes to the cores 
and
I'm not really sure if it's worth it - maybe it would be better to improve pyrit
benchmarking engine.

Original comment by hazema...@gmail.com on 25 Apr 2010 at 5:19

Attachments:

GoogleCodeExporter commented 9 years ago
As far as I understood, I must:

A. clean /usr/locales/lib.python2.6/site-packages.
B. recompile and install calpp-svn-1.tar.gz  (from comment 102)
C. recompile and install cpyrit_calpp-v3-debug.tar.gz  (from comment 114)
D. run "pyrit list_cores" and post results.
E. run "pyrit benchmark" and post results.

I ask this to avoid wrong - and unusefull - activities.

If I understood wrong, please post activity do be done point by point  as A, B, 
C, etc

Original comment by pyrit.lo...@gmail.com on 25 Apr 2010 at 5:46

GoogleCodeExporter commented 9 years ago
A. installing calpp-svn-1 - is required for >=v3 ( it needs to be installed 
only once )

For posting ISA code ( this is for v3 version )
1. change line 152 in _cpyrit_calpp.cpp by removing '//' at the front of the 
line
2. recompile & install 
3. run 'pyrit list_cores' - in addition to cores it will output kernel ISA code
4. post output :)

For testing problem in comment 83 ( this is for v3-debug version )
1. compile & install v3-debug
2. run one batch without slowing down gpu - save output
3. run one batch with slowing gpu - save output
4. post both outputs :)

Original comment by hazema...@gmail.com on 25 Apr 2010 at 6:02

GoogleCodeExporter commented 9 years ago
A. line 152 ( self->dev_prog.disassemple(std::count); ) uncommented.
B. python2.6 setup.py build ; python2.6 setup.py install
C. pyrit list_cores > report.txt
Please see attach report.txt

Problem comment 83,
1. compile & install cpyrit_calpp-v3-debug.tar.gz 
Error: /use/bin/ld: cdannot find -lboost_data_time-mt

Original comment by pyrit.lo...@gmail.com on 25 Apr 2010 at 6:56

Attachments:

GoogleCodeExporter commented 9 years ago
report.txt has 0 bytes ?

Original comment by hazema...@gmail.com on 25 Apr 2010 at 6:57

GoogleCodeExporter commented 9 years ago
Error: /use/bin/ld: cdannot find -lboost_data_time-mt - you need to install 
libboost
date time component

Original comment by hazema...@gmail.com on 25 Apr 2010 at 6:58

GoogleCodeExporter commented 9 years ago
Comment 42 has more info about date_time.

Original comment by hazema...@gmail.com on 25 Apr 2010 at 6:59

GoogleCodeExporter commented 9 years ago
sorry, report.txt correct is here.
By the way, about Error: /use/bin/ld: cdannot find -lboost_data_time-mt and 
comment
42, I was still not able to solve it.

Original comment by pyrit.lo...@gmail.com on 25 Apr 2010 at 7:08

Attachments:

GoogleCodeExporter commented 9 years ago
when you have installed libboost date_time component you should have in your 
/usr/lib
or /usr/local/lib directory libboost-data_time files - on my system it look 
this way:
-rw-r--r--  1 root root   144246 2009-03-27 03:28 libboost_date_time-mt.a
-rw-r--r--  1 root root   239526 2009-03-27 03:28 libboost_date_time-mt-d.a
lrwxrwxrwx  1 root root       33 2009-06-25 16:18 libboost_date_time-mt-d.so 
-rw-r--r--  1 root root   110688 2009-03-27 03:28 
libboost_date_time-mt-d.so.1.37.0
lrwxrwxrwx  1 root root       31 2009-06-25 16:18 libboost_date_time-mt.so 
-rw-r--r--  1 root root    72192 2009-03-27 03:28 
libboost_date_time-mt.so.1.37.0

Now in file setup.py ( line 76 ) there is
libraries = ['ssl', 'aticalrt', 'aticalcl', 'boost_date_time-mt'],

so from the file name part 'boost_date_time-mt' goes to the setup.py file.
When you have date_time component installed this name might be slightly 
different (
sometimes '-' instead of '_' , or no '-mt' part ). You need to change line 76 
so the
name there corresponds to the library file name.

I can't help you with installing date_time component - this much differ 
depending on
the distribution you use.

Original comment by hazema...@gmail.com on 25 Apr 2010 at 7:17

GoogleCodeExporter commented 9 years ago
yes! It was enough to delete "-mt" in line 76!

The problem is that now only CPU are listed by "pyrit list_cores"
doing "pyrit benckmark"(both disturbed and not disturbed) report the same PMK 
for
each core, of course if GPU is not used, to dusturb PC using another xterm does 
not
matter.

What i did wrong?

Original comment by pyrit.lo...@gmail.com on 25 Apr 2010 at 8:01

GoogleCodeExporter commented 9 years ago
The problem is that cpyrit_calpp has some problem at linking time ( during 
start ). 
I would guess that you didn't do ldconfig  ( as root after installing libboost -
date_time ).
Command ldd can show you what is being linked at run time.
Go to the /usr/local/lib/python2.6/dist-packages/cpyrit/ and do 
ldd _cpyrit_calpp.so

Original comment by hazema...@gmail.com on 25 Apr 2010 at 9:32

GoogleCodeExporter commented 9 years ago
I need for someone with 5xxx to make next test :).

This procedure is for version v3
1. Overwrite file cpyrit_calpp_kernel.cpp with attached file.
2. change line 152 in _cpyrit_calpp.cpp by removing '//' at the front of the 
line
3. rebuild & install
4. Post output of 'pyrit list_cores'

PS. The command from point 4 will end with error, but it will generate kernel 
ISA first.

Original comment by hazema...@gmail.com on 25 Apr 2010 at 10:30

Attachments:

GoogleCodeExporter commented 9 years ago
Test results for new Catalyst 10.4 Linux_64 drivers using calpp-svn-1 and
cpyrit_calpp-v3:

Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (39339.6 PMKs/s)... - 

Computed 41530.27 PMKs/s total.
#1: 'CAL++ Device #1 'ATI RV770'': 20184.7 PMKs/s (RTT 2.7)
#2: 'CAL++ Device #2 'ATI RV770'': 20161.6 PMKs/s (RTT 2.7)

Not a whole lot of difference. Decided to keep my OC modest at 650 core and 
stock memory.

Original comment by robert.b...@gmail.com on 28 Apr 2010 at 11:38

GoogleCodeExporter commented 9 years ago
I've done test from comment 125. It looks like v3 version achieved maximum
instruction packing for 5xxx cards.

I'm officially closing this issue :). Thank you all for your help.

As soon as Lukas will accept LowLatencyCore patch new version v3-2 will be 
uploaded
to svn. 

PS. For those interested v3-2 should be 1-2% faster than v3 :)

Original comment by hazema...@gmail.com on 30 Apr 2010 at 1:28

GoogleCodeExporter commented 9 years ago
This version was a big increase over the stock pyrit and cal bencmarks below x2 
4870, however i am still seeing less PMKs for the dual cards than i would see 
if i used them with Crossfire they usually benchmark at 22000

#1: 'CAL++ Device #1 'ATI RV770'': 15795.8 PMKs/s (RTT 1.3)
#2: 'CAL++ Device #2 'ATI RV770'': 17746.9 PMKs/s (RTT 1.2)
#3: 'CPU-Core (SSE2)': 656.8 PMKs/s (RTT 3.0)
#4: 'CPU-Core (SSE2)': 661.2 PMKs/s (RTT 2.9)
#5: 'CPU-Core (SSE2)': 662.3 PMKs/s (RTT 3.0)
#6: 'CPU-Core (SSE2)': 666.7 PMKs/s (RTT 2.6)
#7: 'CPU-Core (SSE2)': 655.4 PMKs/s (RTT 2.7)
#8: 'CPU-Core (SSE2)': 651.4 PMKs/s (RTT 2.8)
#9: 'Network-Clients': 0.0 PMKs/s (RTT 0.0)

Original comment by jezze...@gmail.com on 11 Jul 2010 at 10:25

GoogleCodeExporter commented 9 years ago
The problem is with HT in your system. When HT is enabled 2 threads need to 
share 1 core. So ATI driver thread needs to share CPU with CPU-core computing 
thread. This leads to gpu starvation and thus the reduced performance.

Original comment by hazema...@gmail.com on 11 Jul 2010 at 11:00

GoogleCodeExporter commented 9 years ago
You can by hand reduce number of CPU-Core threads or disable HT.

Use 'pyrit benchmark_long' for much more accurate performance estimation.

Original comment by hazema...@gmail.com on 11 Jul 2010 at 11:05

GoogleCodeExporter commented 9 years ago
Sorry this is an old thread, but I just got it working (Opencl actually just 
locks my computer)

Ati 4670 no overclocking yet

3876.24pmks/s

Original comment by james0p0...@googlemail.com on 2 Dec 2010 at 10:12