Open GoogleCodeExporter opened 8 years ago
i did 2 test.
A. calpp 0.87 with old version pyrit-calp-v2b-2 (took from issue #148) the
PMK/s correctly did 10.000.000 passwrd in 97 seconds.
B. i delete /usr/include/cal and downgrade to calpp.0.86.3 + pyryt-276 but
problem persist, 10.000.000 passwrd in 122 seconds.
I ask to other people with ATI to install calpp 0.87 + pyrit 276 and run some
real tests, not only benchmark or benchmark_long and report if they have the
same issue.
Original comment by pyrit.lo...@gmail.com
on 2 Aug 2010 at 7:34
Could you test v2b-4 with false, false. If it works correctly than we are
hitting some bottleneck in feeding core code.
Main difference between v2b-4(false,false) and svn version is the size of data
sent to gpu ( roughly 3x bigger ). There might be some problem in pyrit which
causes slowdown when working with such a big data blocks.
You can disable CPU cores - if it "solves" the problem then it's most probably
the issue with big data blocks.
You can limit max block size by changing value in cpyrit_calpp.cpp ( line 495 ).
v2b-4 used value ~80000 ( it should be dividable by 4096 ).
Original comment by hazema...@gmail.com
on 2 Aug 2010 at 10:25
Also for v2 div_size=1
Original comment by hazema...@gmail.com
on 2 Aug 2010 at 10:30
I will do tests and report late today.
Original comment by pyrit.lo...@gmail.com
on 3 Aug 2010 at 12:29
hazeman11, here the results.
remove calpp 0.86.3 (delete /usr/local/include/cal directory)
remove pyrit (delete everything inside /usr/local/lib/python2.6/dist-packages/)
install calpp 0.87
install pyrit 276
pyrit benchmark
#1 82000
#2 41000
#3 700
#4 700
#5 0
then I do real test 3 times, the PMK are 69-73
remove pyrit 276
install pyrit v2-4 (sorry, i dont know where to find v2b-4)
pyrit benchmark
#1 69000
#2 34000
then I do real test 3 times, the PMK are 99-100 (more stable results)
remove pyrit v2-4
install pyrit 276
configure limit_ncpus = 1
then I do real test 6 times, the PMK are 120-120-78-75-69-80 (first 2 test was
good but then decrease PMK)
configure limit_ncpus = 2
then I do real test 4 times, the PMK are 90-79-78-81
Ok, I am confused now.
Any suggestion?
Original comment by pyrit.lo...@gmail.com
on 3 Aug 2010 at 5:38
I've uploaded debug modification to svn.
To use it you need to uncomment line 32. Also you need to modify line 76 in
setup.py
libraries=['crypto','aticalrt','aticalcl','boost_date_time-mt'],
( for you system boost_date_time-mt could have slightly different name , check
in /usr/lib/ ( libboost_date_timeXXXXXX ). Boost date-time must be installed.
After the modification calpp core will be printing data about time lost for
data preprocessing - or in other words time when GPU is idle and does nothing.
You can try to play with lines 556...559 in _cpyrit_calpp.cpp
change
div_size=1;
avg_size=xxxx;
max_size=xxxx;
xxxx should be multiply of 4096 . It's possible that pyrit has performance
problems with creating huge data sets required for 5870 to run for 3 secs.
On my cpu ( 2.5ghz ) pyrit can't feed 5850 even during the benchmark - there
are huge amounts of time lost.
Unfortunately preprocess performance is huge problem of pyrit. Pyrit is written
in python which has issues with multitasking. Any preprocessing done in python
( some parts are done in C in computing cores ) is computed sequentially ( so
even if you have 8 cores only 1 is used for preprocessing ). I think this might
be problem that we are getting here.
Also some unstable runs you have could be caused by driver problems - I have
seen this happen on ATI gpu.
Original comment by hazema...@gmail.com
on 8 Sep 2010 at 1:29
hi hazeman11.
My PC has hard disk issue, after I will solve it i will follow you suggestion,
then I will came back to report you.
Original comment by pyrit.lo...@gmail.com
on 10 Sep 2010 at 11:37
haseman11, note I am working now with r279.
I did as you said me, I had A LOT of follow message:
"No fast enough data preparation for GPU: lost time XXX ms"
"No fast enough data preparation for GPU: Estimated lost time YYY ms"
Where XXX goes from 5 to 22 and YYY goes from 3 to 507. (NOTE: sometime YYY is
a negative number, as -1, -3, -207)
Then I comment again line 76 and play wit line 556 ... 559 in _cpyrit_calpp.cpp.
here the result:
div_size=1; avg_size=4096; max_size=4096; ---> PSK=103k
div_size=1; avg_size=4096; max_size=8192; ---> PSK=114k
div_size=1; avg_size=8192; max_size=8192; ---> PSK=114k
div_size=1; avg_size=8192; max_size=16384; ---> PSK=118k
div_size=1; avg_size=16384; max_size=16384; ---> PSK=117k
div_size=1; avg_size=16384; max_size=32768; ---> PSK=90k
All in all, I got back at least 118k, but the 122k I had before still missed.
Maybe there are other parameters I can modifiy?
Original comment by pyrit.lo...@gmail.com
on 18 Sep 2010 at 11:42
I'll explain first what those output means
No fast enough data preparation for GPU: lost time XXX ms
- It's the time between finalising last gpu computations ( GPU could finish computing much earlier - but it's the time when CPU could get back to take care of GPU ) and starting new computations - so any time reported here is when GPU is idle.
No fast enough data preparation for GPU: Estimated lost time YYY ms
- This one is more tricky. In theory CPU should prepare next data for GPU when GPU is working. After preparing data it should wait for GPU to finish current computations. Typical computation cycle should last for 3 seconds.
This communicate is displayed when CPU is waiting for GPU for less then <0.2s . In most cases it means that GPU already has done computations and CPU wasn't fast enough to prepare data for next cycle. The debuging code tries to estimate lost time ( based on GPU speed and current time ) but it isn't always accurate ( so sometimes there are negative values ).
I don't think that modifying any values makes any sense now. It's obvious that
pyrit engine written in python simply can't handle preparing data fast enough
for ATI GPU.
Original comment by hazema...@gmail.com
on 20 Sep 2010 at 11:44
my hardware reached 125K, now only 118k. Something, somewhere, is wrong.
Unfortunalety I have no more the hard disk with configuration (OS, ATI driver,
ATI stream, pyrit version, etc) that gave me 125k, so, I can't be back to these
"golden days". (NOTE: 125k was real, not inaccurate value from benchmark)
Because I know my hardware can does 125K i asked you if tere are some different
parameter to trim.
Anyway, I agree totally with you that lack is in the language of progrmming:
python was able to manage "slow" GPU as NVIDIA 8800 or ATI 4850 but now with
bigger GPU python reached is limit. Unfortunately, my lack in programming does
not allow me to help in porting pyrit from python to C, so I can only wait
someone more skilled than me will do :(.
Original comment by pyrit.lo...@gmail.com
on 22 Sep 2010 at 5:33
some more info.
I redo with div_size=1; avg_size=8192; max_size=16384; and this time I got
PMK=119402, better than before. Maybe because PC was just turned on and it is
fresh :) anyway, che cpu 4 core run at 3.2GHz.
CPU is 3200MHz, 200x16.
Now I overclock CPU at 3500MHz, 200x17.5 (+9,375%) and rerun the same test. The
result in PMK is the same 119400.
This is strange to me, I suppose if cpu run faster it should compensate the
inefficency of python instead it is not.
then I overclocked also (both) GPU from 850Mhz to 875Mhz
(aticonfig --od-enable
aticonfig --adapter=0 --od-setclocks=875,1200
aticonfig --adapter=1 --od-setclocks=875,1200)
and run test again: PMK=112044 (worse!)
then I overclocked also (both) ram from 1200Mhz to 1250Mhz
(aticonfig --adapter=0 --od-setclocks=875,1250
aticonfig --adapter=1 --od-setclocks=875,1250)
and run test again: PMK=123457
then I overclocked agin (both) ram from 1250Mhz to 1300Mhz
(aticonfig --adapter=0 --od-setclocks=875,1300
aticonfig --adapter=1 --od-setclocks=875,1300)
and run test again: PMK=122699
so it seems that bottleneck is the clock of the ram on videocard ahd best speed
is 1250MHz.
I hope these tests I did can help.
Original comment by pyrit.lo...@gmail.com
on 24 Sep 2010 at 5:11
I guess CPU speed does matter!
After overclocking my CPU from 2.5 to 3.3GHz I noticed a huge improvement,
especially when using sqlite:// provider. Now I can get almost 110.000 PMK/s
instead of 90.000.
Okay, it is still far away from the theoretical maximum (140.000 PMK/s) but
finally I can see the bottleneck.
Original comment by kopierschnitte@googlemail.com
on 6 Oct 2010 at 7:08
My conclusion is that CPU overclock helps in YOUR case. In my case, move from
3.2Ghz to 3.5GHz does not give speed up (I suspect that starting from 3.2Ghz,
that CPU speed doesn't matter because of of lack of python and only C could
give further speed up).
Anyway, take in mind that I don't use sql, I use direct file. So, maybe the
problem is elsewhere. I suggest you to repeat my same test (pyrit -e test -r
wpa.cap -i list.txt attack_passthrough) and see if in case of direct file you
can have a further speed up.
A test that I still not do is to trim the number of CPU involved: I have to try
1, 2, 3 or 4 because I suspect that too much CPU involved cause bottleneck in
thread's management. After it, I will post results.
Original comment by pyrit.lo...@gmail.com
on 7 Oct 2010 at 12:13
Okay, interesting point. I will try the attack_passthrough function as soon as
possible.
When you say, you don't use sql and the overclocking didn't get you any
performance boost: Don't you have those extremely long delays (1h and more)
when starting the attack or using the eval command?
I guess that CPU is limiting the GPU-performance as long as you get 100% load
on each core while "feeding" the GPU ... but I might be wrong ;-)
Original comment by kopierschnitte@googlemail.com
on 7 Oct 2010 at 1:20
In issue 191, Lucas stated that the direct file storage isn't quite effective
for large sets of passwords. So I guess it would be the best to focus on the
sql engines for massive amounts of workunits.
Did you already try your above tests using a different storage provider?
Original comment by kopierschnitte@googlemail.com
on 7 Oct 2010 at 2:12
kopierschnitte,
due to my needs, I don't use any kind of *sql database to store password and
PSK, so I can not aswer to your question, sorry. More, I never used 'eval'
command at all.
My "modus operandi" is to have long list of passwords (about 250Million each)
to create cowpatty file with password+psk.
In case of "test on the fly" I use the above method of attack_passthrough.
Original comment by pyrit.lo...@gmail.com
on 7 Oct 2010 at 3:04
Okay, I understand. For 250M passwords, the file provider shouldn't be a
problem. And because you don't write the computed PMKs back to the db, you
don't stress the process too much.
According to your above tests, it seems to be a timing issue between the GPU
memory clock and the speed, pyrit feeds the GPU. Are you sure, you've got
better results with your previous linux installation?
Original comment by kopierschnitte@googlemail.com
on 7 Oct 2010 at 7:32
Overall, one thing that i HATE so much is when people ask me "but are you
sure?" as they suspect I am a complete idiot unable to do my job or to read a
number on monitor.
To convince you, point your web browser here and read whole the story,
http://code.google.com/p/pyrit/issues/detail?id=148
Original comment by pyrit.lo...@gmail.com
on 7 Oct 2010 at 8:00
I did tests.
limit_cpus= 0 AND workunit = 75000 time = 335+K sec (this is the default
setting)
limit_cpus= 1 AND workunit = 75000 time = 335+K sec
limit_cpus= 2 AND workunit = 150000 time = 335+K sec
limit_cpus= 2 AND workunit = 75000 time = 335+K sec
limit_cpus= 3 AND workunit = 150000 time = 335+K sec
limit_cpus= 3 AND workunit = 75000 time = 334+K sec
limit_cpus= 3 AND workunit = 300000 time = 333+K sec
where 0 < K < 1
I can say 'limit_cpus' and 'workunit' are 2 parameters that does not affect the
result in my case.
Original comment by pyrit.lo...@gmail.com
on 7 Oct 2010 at 8:09
I don't think you are an idiot. Sorry if my post offended you that way. I just
wanted to know if you've got any idea what the cause could be as it seems we
are both fighting the same problem.
Original comment by kopierschnitte@googlemail.com
on 7 Oct 2010 at 9:27
Well, to me is impossible to identify where the problem is because there are
too much variables to take in cosideration.
A. Pyrit get changes weekly
B. A new catalyst driver every month (and regression seems to be the rule, not
the exception)
C. SDK change version quite often
D. calpp moved from 0.86.3 to 0.87
E. I upgraded from debian stable 5.0 to debian testing 6.0 because of it is
needed to use python >= 2.6 and pyton-scapy 2.0 is in debian >= 6.
F. there are some parameters you can play with inside pyrit (limit_cpus,
avg_size, max_size, etc etc)
It is quite hard to identify where the regression is: bad interaction between
driver and hardware? bug in pyrit? problem in SDK? regression in calpp? Who
knows? To me, now the rule is: "when you find the best configuration,
IMMEDIATELY do a data dump of whole harddisk and save it in safe place". If I
follow this golden rule months ago, I was able to save my 125K PMK/s.
Anyway, I am confident that to rewrite pyrit in C will reduce/eliminate
problems; have a look to issue 185 to read discussion about this proposal:
unfortunately, it seems to be a "low priority" issue :( instead it should be
(in my opinion) "THE" issue.
Original comment by pyrit.lo...@gmail.com
on 7 Oct 2010 at 10:23
Sorry, I did not understand your workflow correctly. Of course, the workunit
size shouldn't matter because you don't do anything with the database / storage
provider.
So I agree that this is "error by design". In your case, the reading and
preparing of the passwords, the queue management and the GPU-feeding is done in
a single thread.
How is the CPU utilization (per core) when running attack_passthrough with
limit_ncpus=1?
Original comment by kopierschnitte@googlemail.com
on 8 Oct 2010 at 12:54
I runned test in "brainless" mode.
There are variables? ok, I run pyrit with different value for each variables an
I count the time to complete te same task for the divverent values.
In other words: I don't know if changing value of workunit will geve me speed
up, but I did te same just to have a more wide range of results.
I dont know hom much is the load of (each) CPU when I run attack_passthrough
with limit_ncpus=1: i did not check it. In past, I discovered that if I disturb
PC doing other tasks (top, ps, htop, etc) when pyrit is running, then pyrit get
speed down. See issue 148 comment 110. Because of it, I am used to do nothing
else when pyrit is running.
Original comment by pyrit.lo...@gmail.com
on 8 Oct 2010 at 2:08
Okay, forget the workunit value. It doesn't matter in your case. I was asking
about the CPU utilization because I'm quite sure that this is the limiting
factor but for now I don't understand why pyrit only reaches the CPU limit when
using the attack- or batch- but not when using benchmark-command.
Did you adjust your DISPLAY environment setting?
What's your current output of "echo $DISPLAY" on your system?
Did you attach a monitor on each GPU?
Original comment by kopierschnitte@googlemail.com
on 14 Oct 2010 at 10:40
>I don't understand why pyrit only reaches the CPU limit when using the attack-
or
>batch- but not when using benchmark-command.
I suspect because benckmark is not a REAL work but only a test, so it does not
really "push to the limit" the CPU: so python can manage a test, but can't
"push to the limit" the CPU.
about display: I set "export DISPLAY=:0" in /root/.bashrc: all my activities
related to pyrit are made with root account, so the DISPLAY variable is correct.
The output of "echo $DISPLAY" is ":0"
Monitor: no, only one monitor (the only one I have) is connectd to primary
output of primary videocard (HD5870).
Original comment by pyrit.lo...@gmail.com
on 14 Oct 2010 at 8:23
Hmm, but I always thought, pyrit just "replays" sample computations when doing
benchmarks. Maybe pyrit is calculating random passwords or something like this.
I still suspect the file i/o subsystem to be our bottleneck on this issue :-(
Regarding the DISPLAY and/or monitor thing: Someone in the ATI forum wrote that
it could matter...
Original comment by kopierschnitte@googlemail.com
on 15 Oct 2010 at 7:13
about suspect of i/o bottleneck: I use XFS, what filesystem do you use? try
different filesystem, maybe you will get better i/o bandwith. Or use 2
different disk from whom to read password and one to store PMK. If you have
enough ram, you can use /dev/shm to eliminate delay of mechanical hard disk.
about DISPLAY: who say? what exactly did he say? where? about what it could
matter? please give more and - if possible - sure info
Original comment by pyrit.lo...@gmail.com
on 15 Oct 2010 at 7:53
please take discussions to the mailing list where everyone can find it
Original comment by lukas.l...@gmail.com
on 15 Oct 2010 at 8:00
Issue 198 has been merged into this issue.
Original comment by lukas.l...@gmail.com
on 17 Oct 2010 at 12:31
I'll try to stick on the issue's topic now, sorry.
The information about the DISPLAY: variable can be found here in issue 123
(comment 44).
The other sources for my last comment are
http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=140633 (for the
2nd monitor topic) and
http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=139606 (for the
DISPLAY: thing).
Regarding the I/O performance, I also thought the HDDs might be a bottleneck,
but iostat -m 1 is telling me something about 200 IOPS and >2MB/s read/write
speed at the moment pyrit runs. I suspect that recent SATA drives can easily
handle more IOPS and I've crosschecked this by doing some simple copy tasks
(achieving more than 400 IOPS). But I'll try to switch to XFS or btrfs as soon
as possible.
Maybe there's some way to control how much passwords pyrit holds in memory (=
size of the ring buffer / queue)...
Original comment by kopierschnitte@googlemail.com
on 17 Oct 2010 at 7:26
Please stop talking about HDD performance or other stuff. You can check if it
is python's/pyrit's fault by running 2-3 instances of pyrit at the same time.
On our test machine we get 170KPMK/s on singe pyrit instance and 240-280KPMK/s
on 3 instances. It's very simple test...
Original comment by mmajchro...@gmail.com
on 17 Oct 2010 at 7:30
mmajchrowicz, ok I see the point.
Pyrit/python/catalyst should scale in function of hardware, instead is not, and
your test confirm it.
from my side, if I test one by one the two single cards, HD5870 double the
power of HD5770, but when i use togheter, the power is not 150% but less.
By the way, a HD5970 should to at least 120K and 4 HD5970 should do at least
480K but it is not. To me, it is pyrit/python/catalyst that does not scale up
all that powrfull hardware (or maybe the lack is from driver) Anyway, It is not
a problem can we can solve but only report.
Original comment by pyrit.lo...@gmail.com
on 17 Oct 2010 at 8:27
We need some python/pyrit magic or full c/c++ rewrite... Many modules are
already writen in c/c++ so maybe we can put some managment code to external
c/c++ module since this seems to be the issue ?
Original comment by mmajchro...@gmail.com
on 17 Oct 2010 at 8:43
Okay, let's forget the IO performance but it's still a fact that each GPU needs
a dedicated CPU core. In your case, this would mean a total of 8 cores. I guess
that's one reason you don't get 480k PMK/s.
Pyrit's "queue management" is surely another point and explains why several
instances are giving you slightly higher performance. And exactly that's the
point we can work on. Hopefully this doesn't involve a complete rewrite. Maybe
an optimized threading architecture or larger buffers can also help.
How did you measure your results? As far as I've understood this issue, we are
talking about a huge difference in "benchmark_long" and the main attack
commands ("batch", "attack_db", etc.).
Original comment by kopierschnitte@googlemail.com
on 17 Oct 2010 at 8:59
ATM I not interested in getting 240k PMK/s on system with 4 cores and 2 HD5970
running one pyrit instance instead of 2-3 instances. Besides it should be at
least 280k MPK/s since I am able to up to 75k PMK/s for single core of HD 5970
and 135-140k for two cores.
Original comment by mmajchro...@gmail.com
on 17 Oct 2010 at 9:05
I am talking about results that I get from benchmark/benchmark_long commands. I
know I will get less when I run other commands I know that other "parts" of
pyrit also have impact on it's performance but I don't see a point in trying to
optimize other parts if I am not able to use at least 80-90% of computing power
on simple benchmark.
Original comment by mmajchro...@gmail.com
on 17 Oct 2010 at 9:09
I opened this ticket issue because the problem cames when I run REAL test on
data, not when I run benckmark, I don't see the vantage to investigate on
benchmark/benchmark_long commands: they are just a bogomips(*) benchmark, not a
real test on real data that involve also other parts of pyrit.
(*)http://en.wikipedia.org/wiki/BogoMips
Original comment by pyrit.lo...@gmail.com
on 18 Oct 2010 at 9:05
You are completely wrong. If CPU has problems running benchmark alone it's
obvious it will have even bigger problems calculating PMK's (main part of
benchmark) and also other stuff (reading passwords, writing results and other
stuff). On our hardware we also get performance issues when we run REAL test on
data but since we are not able to get even 50% of computational power when
running simple benchmark don't you think this should be fixed first? Come on
guys what is the point of reimplementing other stuff when it's impossible to
reach full potential on "test" command? Besides CPU performance probably also
has influence on your results. It is just more visible when you "force" pyrit
do to other stuff beside calculating PMK's
Original comment by mmajchro...@gmail.com
on 18 Oct 2010 at 9:20
mmajchrowicz,
ask to yourself WHY there are bigger problems when REAL test is runned: maybe
it is because the problem is not into benckmark test.
beside, as far as I know, there is an issue on Catalyst driver that does not
allow to HD5970 to use BOTH GPU, so only one it is use on each HD5970.
from http://pyrit.wordpress.com/2010/08/16/ati-still-sucks/ I report the follow:
"The ATI RadeonTM HD 5970 GPU is currently supported in single-GPU mode only.
It is recommended users only access the first device on an ATI RadeonTM HD 5970
GPU for GPU compute."
So, until ATI will fix the issue, a monocore GPU HD5870 will have more power
than a dualcore GPU HD5970.
HD5870@850Mhz does 82000 PMK/s: 5970 run at 725Mhz so 82000/850*725=70K (one
core).
So, ath the moment, each HD5970 should do no more than 70K, so 4*70=280K PMK/s,
that is exaclty what you get from your hardware. For 5970, the main issur is
inside Catalyst, not inside pyrit.
I got clock from here:
http://en.wikipedia.org/wiki/Comparison_of_AMD_graphics_processing_units
Original comment by pyrit.lo...@gmail.com
on 18 Oct 2010 at 10:25
You are again "almost" right and as a result completely wrong. First of all CPU
cycles aren't magical. If pyrit/python doesn't "have" enough of them for just
benchmarking it is obvious that it will "have" even less if you ask it to do
additional stuff.
Secondly ATI lies :) I don't know why but they do. It's probably because it
didn't work with previous version of driver but it works now flawlessly but
they just didn't change the note on their web page. As I have mentioned before
(when I have ONLY ONE HD 5970 in my machine) I get 75k PMK/s for single core
but when I use BOTH CORES I get around 135k-140k PMK/s so second core really
works. Also lukas mentiones our configuration
(http://pyrit.wordpress.com/2010/10/07/pyrit-on-4x-radeon-hd-5970/) as sample
of using multiple GPUs so that's really not the issue.
Original comment by mmajchro...@gmail.com
on 18 Oct 2010 at 10:35
mmajchrowicz,
Question: if you run 8 pyrit session (one for each GPU avaiable) how much
aggregate PMK do you get? Or maybe in this case the limit is the CPU because
pyrit cannot manage 2 GPU with only 1 CPU? (I suppose you have a real quad cure
CPU, not Hyper Threading)
Original comment by pyrit.lo...@gmail.com
on 18 Oct 2010 at 10:48
Yes we have real quad and forget that I have mentioned that we have 8 GPUs.
Let's assume we only have 4 GPUs (two HD 5970) so according to your opinion
everything should work fine and we should get maximum power but we get
something like this:
1 pyrit instance - 170k PMK/s
2 pyrit instances - 190-200k PMK/s
3 pyrit instances - 220-230k PMK/s
Besides pyrit is not optimized for running multiple instances I just mention
because if you run multiple instances every python/pyrit process "gets" another
core and as a result CPU power used for PMK computation management is higher.
If you say that you were able to get 120k PMK/s with pyrit I wouldn't be
surprised that you may be able to get 130-140k PMK/s once pyrit code managment
is fixed/rewritten. The "regression" that you have noticed is probably because
more features or more complex code is used "on CPU side". The point is to
"force" pyrit (if it is possible at all with python) to better utilize power of
multi-core CPUs. I believe this is the foundation of our problems and my tests
prove it.
Original comment by mmajchro...@gmail.com
on 18 Oct 2010 at 10:57
I will do tests with 2 pyrit istances later today, than I will report here the
results.
Original comment by pyrit.lo...@gmail.com
on 18 Oct 2010 at 11:16
mmajchrowicz, I totally agree with you: If the cpu(-core) isn't capable of
doing a benchmark run at maximum possible speed, it's impossible to get near to
this when running the main attack functions.
The dual-GPU issue really seems to be fixed a few driver releases ago. You can
easily check this be looking at aticonfig's output. On my system, I get around
90% GPU utilization on both 5970-cores. But only during benchmarks ... every
other function results in highly instable values between 20 and 70%. So it's
obvious, the GPU is waiting for input.
I suppose, when you run multiple instances, the instances are simply taking up
the remaining (tiny amount of) processor idle time. But python itself should be
able to handle multiple threads/cores...
Again, it would be great if someone with python knowledge could take a look
into the code to see if there are any hardcoded queue-lengths or something else
that prevents the 5870/5970 from being constantly(!) busy.
Original comment by kopierschnitte@googlemail.com
on 18 Oct 2010 at 2:34
Maybe I'll explain few things.
First of all - difference between benchmark and normal work.
Code execution for benchmark looks like this
1. Use random to generate data and put to the queue ( this is done in loop )
2. Take data from queue and call core function to process ( Core class )
3. Process data ( this is done in plugin core - opencl, cal++, cpu )
For normal attack it looks this way
1. Take data from storage ( files, database ) and put to the queue
2. Take data from queue and call core function to process ( Core class )
3. Process data ( this is done in C plugin - opencl, cal++, cpu )
So we see that point 2 & 3 are exactly the same for both cases. Only point 1 is
different - and usually taking data from storage will be more cpu intensive.
Now the question of multiple instances of pyrit and why it improves performance
- This is caused by big problem of python - Global Interpreter Lock (GIL). Any
code in python must lock GIL very often ( few hundred ops ) - for multithreaded
python code it implies that only ONE THREAD will be executing at the moment. So
python code is effectively using only ONE CORE. In pyrit this problem was
temporally "solved" by putting computations into C cores. But this "solution"
isn't working anymore for high performance GPUs.
So running multiple instances of pyrit allows to use more than one CPU core for
data preparation.
The cpu bottleneck problem in pyrit could be partially solved by switching from
thread library to process library - this isn't so hard to do and doesn't take
so much work. But I think that only C/C++ rewrite would be proper solution to
current pyrit problems.
I won't start making so big changes as I'm only responsible for CAL++ part of
pyrit.
Lucas must decide what to do with pyrit.
Original comment by hazema...@gmail.com
on 18 Oct 2010 at 4:19
I did test as promised.
If I run 2 instances of pyrit, I got LESS PMK/s than run a single instance.
Single instance: about 120K
double instance: 43K + 35K.
Original comment by pyrit.lo...@gmail.com
on 18 Oct 2010 at 7:09
Unfortunately it doesn't prove anything... In order to get better results with
multiple pyrit instances you must have big difference between what you are
getting and what you are "supposed to get". You must take into consideration
that in such scenarios pyrits are fighting for GPU :)
Original comment by mmajchro...@gmail.com
on 18 Oct 2010 at 7:23
Thanks for this detailed explanation. So it's really up to Lucas to decide
which way pyrit will take.
In my case, multiple instances are also decreasing the overall performance. But
I guess if the speed of one cpu core is sufficient for feeding exactly one gpu
core, no improvement is expected. In this case, there is no kind of race
condition. When file (or password) i/o is taking place, this might be different.
Just an idea: Could the multiprocessing interface be helpful?
-> http://docs.python.org/library/multiprocessing.html
Original comment by kopierschnitte@googlemail.com
on 18 Oct 2010 at 8:30
kopierschnitte: Your test is flawed as both instances of Pyrit battle for the
same GPUs. Overhead is killing performance as they get in each others ways.
Python will always be an integral part of Pyrit (see the name).
The latency(!) to supply work to the GPU is now a bigger problem than once
thought as GPUs are much faster than expected. Remember: 10.000 PMKs on a
single, high class GPU was to be a big number just a few months ago. Now we are
talking about 280.000 PMKs on 8 GPUs...
The GIL is not a problem and (forgive me for saying) not well understood in
it's consequences. Other threading-libraries are not considered; Pyrit will
always rely on CPython's threading. The multiprocessing-module is not
considered as it is not available in Python 2.5; it is also not very stable and
caused numerous problems in other projects.
What needs to be done is to implement a kind of triple-buffering between CPU
and GPU. In this approach, the thread that steers the GPU runs for almost all
of it's lifetime without ever the need to acquire the GIL. So, there is a
solution for those top 10% users of Pyrit with high-end hardware already in my
mind :-)
Again, my main problem is time (full time job and diploma thesis) and
hardware-access that can actually provide the workload (i've one six years old
computer with a 4850 and a macbook pro with a 9800m).
Original comment by lukas.l...@gmail.com
on 18 Oct 2010 at 9:16
Original comment by lukas.l...@gmail.com
on 18 Oct 2010 at 9:19
Original issue reported on code.google.com by
pyrit.lo...@gmail.com
on 1 Aug 2010 at 2:17