roobixx / pyrit

Automatically exported from code.google.com/p/pyrit
0 stars 0 forks source link

pyrit-calpp v2 - testing required #148

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I'm attaching new version of CAL++ computing core. It's about 8-9% faster
than the one in svn. 
I have to admit that this is a little strange solution and I'm not sure how
it will behave on other ATI cards. So it needs a little bit of testing :).

Standard pyrit core works in the following way
- prepares data 
- transfers to gpu, 
- gpu runs 
- transfer data from gpu
- postprocess data. 
Simple timing analysis shows that this approach shouldn't waste more than
1-2% of time ( when gpu is not working ). 

The new calpp v2 core is trying to mask transfer/data processing by gpu
computations ( so we start computation and then when gpu is busy we process
data on cpu ). It should be only 1-2% faster. But it isn't :).

First of all the simple analysis doesn't take into account driver/card
behavior. ATI's driver are really sensitive to cpu performance and probably
are hiding some actions.

Now to the interesting part :). The v2 core doesn't do explicit data
transfer at all. Trying to send data to gpu while it was working resulted
in some performance degradation. So now all the data are in host memory and
gpu is taking it directly during computations. Fortunately in the case of
pyrit amount of computations is so huge that transfer from host memory can
be masked by computations from other gpu threads.
This may not be true for 5xxx cards where kernel is smaller and faster.
Also memory<->gpu transfer speed may have some impact.

So please test the v2 core and post here your results.
Also you should try to change line 453 in pyrit/cpyrit/cpyrit.py from
'ncpus-=1' to 'ncpus-=2' ( or 'ncpus-=4' ) and test impact of more
available cpu cores on performance.

The v2 core works with svn version of CAL++ library (it has been attached)

Original issue reported on code.google.com by hazema...@gmail.com on 16 Apr 2010 at 3:44

Attachments:

GoogleCodeExporter commented 9 years ago
pyrit.lover: Yep all the cores ( except v2 ) report peak performance. But even 
for v2
your method of testing is most accurate. 
odlan3: There is not much transfer involved. So for pyrit it's not such a big 
issue.
Whole transfer takes ~1% of time and gpu->cpu is 2/7 of this. You can read more 
about
slow pci transfer problem here -
http://forums.amd.com/devforum/messageview.cfm?catid=328&threadid=130923&enterth
read=y

Original comment by hazema...@gmail.com on 22 Apr 2010 at 7:57

GoogleCodeExporter commented 9 years ago
As a small bonus to all who helped with testing I'm attaching v3 core. 
This one has improved kernel and uses latest svn version of CAL++ library.

Original comment by hazema...@gmail.com on 23 Apr 2010 at 12:47

Attachments:

GoogleCodeExporter commented 9 years ago
Here's the results from my 4850s (oc'd core to 650, mem is stock) on v2c:

Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (38905.1 PMKs/s)... | 

Computed 40537.35 PMKs/s total.
#1: 'CAL++ Device #1 'ATI RV770'': 19610.6 PMKs/s (RTT 2.7)
#2: 'CAL++ Device #2 'ATI RV770'': 19215.3 PMKs/s (RTT 2.9)

Looks like a keeper to me.

@pyrit.lover: I responded to your overclock questions in the google group as 
lukas
requested in comment 94. Don't know if you saw it.

Original comment by robert.b...@gmail.com on 23 Apr 2010 at 12:55

GoogleCodeExporter commented 9 years ago
My 4850s' (650 core, 993 mem) results using calpp-svn-1 and cpyrit_calpp-v3:

Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (39441.6 PMKs/s)... \ 

Computed 41032.62 PMKs/s total.
#1: 'CAL++ Device #1 'ATI RV770'': 20140.0 PMKs/s (RTT 2.7)
#2: 'CAL++ Device #2 'ATI RV770'': 19965.0 PMKs/s (RTT 2.8)

Looks like you managed to squeeze some more juice out of it. Great job!

Original comment by robert.b...@gmail.com on 23 Apr 2010 at 4:10

GoogleCodeExporter commented 9 years ago
@hazeman11: Great job...I am impressed
pyrit benchmark
Pyrit 0.3.1-dev (C) 2008-2010 Lukas Lueg http://pyrit.googlecode.com
This code is distributed under the GNU General Public License v3+

Running benchmark (142081.1 PMKs/s)... - 

Computed 145429.78 PMKs/s total.
#1: 'CAL++ Device #1 'ATI CYPRESS'': 71047.7 PMKs/s (RTT 1.0)
#2: 'CAL++ Device #2 'ATI CYPRESS'': 71352.4 PMKs/s (RTT 1.0)

Original comment by odl...@gmail.com on 23 Apr 2010 at 7:47

GoogleCodeExporter commented 9 years ago
I'm happy that you like it :). This is my way of thanking you all for helping 
with
tests :). I really appreciate it.
I'm only sorry that speedup on 4xxx was so small. It could be ~10% but CAL IL
compiler ( in ATI driver ) has issues. When the pressure on instruction 
scheduling
has been reduced it started to use smaller ALU clauses to save registers - 
which is
totally bad direction for this type of kernel :(.

PS. Could someone with 5xxx attach output of 'pyrit list_cores' with 
uncommented line
152 in _cpyrit_calpp.cpp ?

Original comment by hazema...@gmail.com on 23 Apr 2010 at 12:33