Closed maztheman closed 7 years ago
Looks like GTX 1080's are getting super LOW sols, will have to investigate..
GTX 1070: -cb 256 -ct 32 seems to work Though the blocks should be in multiples of the sm count so 255 might work better..
is not cb ct for cuda part? i think opencl silentarmy not use that values edit: oh, didnt look what you did in commits, so you trying port to cuda? I will test it
Yes I ported silent army to cuda !
Nice!!! It compile without any problem.
i got on gtx980 38 sol/s (cb 128, ct 64) (100% gpu load, 28% memory load, 63% power) CPU 0% !!! (btw something is wrong, cause i could not increase ct more than 64, (gtx980 have 1024thread/block))
for comparision: nicehash/nheqminer cuda - 25 sols (cb 64 ct 64) 2x nicehash/nheqminer opencl silentarmy - together 33 sols (but each instance use full core) zcminer - 40 sols (but fee 2,5% and using one full core)
ill try compile for my 650m one sec..
64/64 seems to be great on a GTX 970 as report by another user
for me 128/64 was better (tryed many values)
results on gt650m: 6.2 sol/s (cb 128, ct 64)
for comparision: nicehash/nheqminer cuda - 3.1 sol/s 1x nicehash/nheqminer opencl silentarmy - 6.1 sols (but use one full core, to low gpu mem to run 2 instances) zcminer - not working
Looks like it is a success, i hope you will keep updating/optimizing, cause i like cuda version better than opencl
Yes, I will keep this up to date with the silent army builds. Technically I have no real idea what this code is doing. I just poet it over. All the hard work is done by someone else. :-).
Also I can't post any more on z cash forum. I'd like to get more people testing.
I even do not know basic programming... I can only use tools, and compile... Btw you should set in VS project more optimize options, specific code generation and remove debug from release (thats why i always compile)
Mine is win10 64, 5 1070 cards, how to set parameters?
-cs -cb 64 -ct 64 -cd 0 1 2 3 4
Try that for now
Linux. no cmake file, but I created ones... and avx need to launch sa.. fixed it too.
Thanks!
best results got with -ct 32 -cb 90 and only 38 s/s on 1070 -(
Can you take a look on my optimized sa opencl version for nvidia ? sa-nv.tar.gz
it gets 50 s/s on 1070 with 1 thread
NR_ROWS_LOG must be 19 on nv and OVERHEAD 8 my source contains other improvements in input.cl by eXtremal and others https://bitcointalk.org/index.php?topic=1666489.360
GTX 1070: -cb 256 -ct 32 seems to work Though the blocks should be in multiples of the sm count so 255 might work better..
I try as this parameter and get 148h/s total with win10 64, five 1070 cards. And now try other parameters.
-ct 64 -cb 256 40 s/s on 1070
Gtx 1080 best speed : nheqminer -cs -cb 8192 -ct 8 = 33/sols Not fast enough but continue your good job you doing well :)
I might try to convert it to use 2d allocations as it might be more efficient
-cb 256 -ct 32 seems only reset the first card.
gtx 1070 #0 :blocks=256,threads=32, gtx 1070#1:blocks=480,threads=64 as default. others the same as gtx1070 #1
Oh okay I'll make some changes so it'll force for all.
Btw there should be a tool for benchmark cb, ct options (like run 2 sec test in loop and record sol/s for a range of cb, ct and sort for best results) then everyone could run and check what is best for them
Looks like peoples found 2 small bugs, one where with -cs but without -cv 0 it ends in cuda_tromp (nheqminer was set default on old cuda_trump), and second where without avx but even with cv 0 it ends in cuda_tromp (maybe some code check for avx)
Does not work on GTX 580 running CUDA 2.0. Oddly enough though silentarmy seems to work fine on R7 APU GPU at @ 3 Sol/sec and an Intel integrated graphics using the -od switch. Only got like 3 sol/sec on intel graphics but every little bit helps and it didn't seem to decrease my CPU sol rate.
hey guys, I think i may have figured out what was going on. I had some launch bounds which I think caused cuda to force only 64 blocks or something...Im gonna make that small change and checkin, and build. then i will be looking at some other major changes which will take longer. I will be posting a new build.
Please check last updates HUGE improvemnts. 1070 = 80 s/s https://bitcointalk.org/index.php?topic=1666489.400
the silent army option is not working and ignored still reporting CUDA
@kruisdraad I think in the new version 0.4g this is addressed.
@maztheman how come people are reporting a 80 sols/s per card where at any test i dont get passed 45 at all. that on Linux @krnlx ? whats the power / gpu usage?
@krnlx changing input.cl does not work on windows, even deleting it with kernel.cl does not change (i would assume the miner wouldnt start at all) perhaps its hardcode in the exe?
I don't know where the 80 sols / s are coming from. Probably not the cuda version I have made here. Im trying to update the kernel based off what @krnlx posted. I have yet to have it pass any tests. There seems to be a indexing problem that is causing the kernel to fail. It's just a technical issue that I need to debug. I made the 0.4g removed some threading restrictions that was not required in the old kernel I made. But with the. New kernel looks like the limit has to be there. I'll keep you guys posted.
consider some cleanup.. see https://github.com/tpruvot/nheqminer/commits/cuda-silentarmy
tromp avx and the double cuda sdk trick are not useful, but tx for your work.. seems a good base to begin the cuda work ;)
How do you compile cuda silentarmy on ubuntu 16.04? There is no cmake file in the dirs?
all tweaks in git now, I only fix cpu load. https://github.com/mbevand/silentarmy
tested tpruvot with SA5 cuda port from krnlx, it gave my gtx980 58 sol/s I know that it is not final product, but wanted to share...
for comparision:
Is it possible you could compile it for windows 8.1? thx
Right now i only tested on GTX 650, and it "works" but possible could use a couple tweaks.