Closed maztheman closed 7 years ago
Windows build please
it is failing, i have to fix it up...
If you have a 1070 or 1080 please test it:
try: https://github.com/maztheman/nheqminer/releases/download/v0.4h/v0.4h_1070_plus.7z
i remove all the thread waiting..
the v5 silent army doesnt seem to be too great when converted to CUDA :(, however It might just be the way its looping...i might be able to fix it
Hmm, w/ 0.4h_1070_plus this happens: ;) Just for the sake of comparison - this is what I get w/ the same 1080 on one of the original SA5 python ports for windows:
yeah there is some major issue with cuda and that code that was posted for opencl. cuda keeps "timing" out when I try to do same kind of calls...I guess itll have to be a work in progress for now.
Ill post again when I see improvement with my GTX 650, which should then indicate some progress with 1080s, etc.
I see from the author or silentarmy:
mbevand commented 2 days ago • edited @tupieurods You are right. Didn't know shared atomics were not hardware implemented pre-Maxwell. That' s definitively the cause of the slowdown then, because this commit makes heavy use of shared atomics. I see no solution other than maintaining a 2nd separate version of input.cl specifically for pre-Maxwell Nvidia GPUs then.
Well based off that guys message I reverted back to a more direct conversion, maybe itll work better:
https://github.com/maztheman/nheqminer/releases/download/v0.4h/v0.4h_MAXWEL_PLUS.7z
It was super slow on GTX 650 but looks like it will be because of the shared atomics
It gives me much higher I/s numbers + full GPU load, but no Sols, unfortunately =)
hmm wierd, i get 78 sols/s with my R9 290 with the v5 code in open cl. 0 sols probably because some timeout is causing a crash
I ran both versions.
Setup: 6x GTX1070 with Windows 10 latest drivers, etc (anno vers)
0.4h plain: 241 sol/s power usage 51% 0.4h maxplus: 2.4 sol/s and a near 400 I/s. Also the power usage is above 70%
previsous version got about 250 so its a little less, power usage is much more stable though.
hmm 241 is really not that competitive to the SA linux version, is it?
Honestly i havent tried the linux version yet, The zcminer-dev windows version gets about 300 so its not that bad.
You have Linux example on actual speeds?
Compiled. For me crashing after 20 sec (i see constantly increase use of gpu memory till all 4GB is fillup) and then nvidia is in P5 locked state..
[23:00:05][0x000011dc] Using SSE2: YES [23:00:05][0x000011dc] Using AVX: YES [23:00:05][0x000011dc] Using AVX2: YES [23:00:05][0x000011a8] stratum | Starting miner [23:00:05][0x000011a8] stratum | Connecting to stratum server eu1-zcash.flypool.org:3333 [23:00:05][0x00001194] miner#0 | Starting thread #0 (CUDA-SILENTARMY) GeForce GTX 980 (#0) BLOCKS=64, THREADS=512 [23:00:05][0x000011a8] stratum | Connected! [23:00:05][0x000011a8] stratum | Subscribed to stratum server [23:00:05][0x000011a8] miner | Extranonce is fa613f0f55 [23:00:05][0x000011a8] stratum | ←[35mTarget set to 004189374bc6a7ef9db22d0e5604189374bc6a7ef9db22d0e5604189374bc6a7←[0m [23:00:06][0x000011a8] stratum | ←[36mReceived new job #11ee30962342110c9785←[0m [23:00:20][0x000011dc] ←[33mSpeed [300 sec]: 8.72074 I/s, 15.646 Sols/s←[0m [23:00:36][0x000011dc] ←[33mSpeed [300 sec]: 8.94374 I/s, 16.6693 Sols/s←[0m
yes, sorry there is a memory "leak" that i already have fixed but not have pushed Its only supposed to create the 2 buffers once, and reuse the, in my old code I was creating it every time and not deleting....:P
I tried it on a 750ti comp 5.0 and I get this massage. Missing file MSVCP140.dll for win 8.1
oh, you need to install the redist file: https://www.microsoft.com/en-ca/download/details.aspx?id=48145
@chronosek ive checked in the fixes
That worked...thx :)
@maztheman thx, not crashing now, but always get 0 sol/s no matter what -cb, -ct will set
Yeah some internal issue, ill have to fully debug it again...
Getting @ 17-18 sol/s with a 1060 3gb card with 0.4h. I found the best rate for this card is obtained with -cb 128 -ct 32 (autodect puts it at -cb 63 ct 64 and only yields 15 to 16 sol/s.
removing packed atomic counters = bad, I tested it in opencl, it is fastest. I tested packed 64-bit atomics too, 1-2% slower
tpruvot did good job porting to cuda, specially with atomic part (working stable at 58 sol/s but still eqm doing 66 sol/s), i think maztheman problems was from some hardcoded values or other code what was not in silentarmy
I added a test program that will help me debug this cuda issue. Please post your log files.
I created a new build that should "work" on 1080 and 1070's again. Probably wont break any records though...
Confirmed, it works and even gives a bit better results on my GTX 1080 than all the previous builds. Thank you!
working for gtx980 - got 41 sol/s
Still not working with GTX 580 1.5GB
@dtaworm, can you run the debug tool and post the log?
@maztheman yeah i'll do that tomorrow afternoon when I get into work.
On CUDA, it doesn't display what cards are doing what hash. :)
arg 1 = -r
Settings:
NR_ROWS_LOG = 20
OVERHEAD = 6
COLL_DATA_SIZE_PER_TH = 60
Run Count = 100
Device #0 | GeForce GTX 1080 | Running Tests....
Silentarmy V5 test
35.878 sols/s
Silentarmy V4/V5/maztheman mix test
29.643 sols/s
maztheman test
0 sols/s
arg 1 = -r
Settings:
NR_ROWS_LOG = 20
OVERHEAD = 6
COLL_DATA_SIZE_PER_TH = 60
Run Count = 100
Device #0 | GeForce GTX 580 | Running Tests....
Silentarmy V5 test
18.6791 sols/s
Silentarmy V4/V5/maztheman mix test
30.8821 sols/s
maztheman test
12.5392 sols/s
arg 1 = -r
Settings:
NR_ROWS_LOG = 20
OVERHEAD = 6
COLL_DATA_SIZE_PER_TH = 60
Run Count = 100
Device #0 | GeForce GTX 550 Ti | Running Tests....
Silentarmy V5 test
<error> cuda error 2, out of memory in function 'context::init', line 1313
context intialization failed
...
0 sols/s
Silentarmy V4/V5/maztheman mix test
<error> cuda error 2, out of memory in function 'context::init', line 1313
context intialization failed
...
0 sols/s
maztheman test
<error> cuda error 2, out of memory in function 'context::init', line 1313
context intialization failed
...
0 sols/s
hmm interesting
On my GeForce GTX 650 nheqminer -t 0 -cd 0 -cs -b
[15:25:51][0x00000a7c] Using SSE2: YES [15:25:51][0x00000a7c] Using AVX: NO [15:25:51][0x00000a7c] Using AVX2: NO [15:25:51][0x00000a7c] Benchmarking CUDA worker (CUDA-SILENTARMY) GeForce GTX 650 (#0) BLOCKS=14, THREADS=64 [15:25:51][0x00000a7c] Benchmark starting... this may take several minutes, please wait... [15:26:34][0x00000a7c] Benchmark done! [15:26:34][0x00000a7c] Total time : 42382 ms [15:26:34][0x00000a7c] Total iterations: 200 [15:26:34][0x00000a7c] Total solutions found: 374 [15:26:34][0x00000a7c] Speed: 4.71898 I/s [15:26:34][0x00000a7c] Speed: 8.8245 Sols/s
Is this good?
UPDATE: tinkering around
For a GTX 650, that is what I get. I havent seen any higher.
We're you able to over clock the card?
Edit: oh I see you have also 2 cpu cores
Noticed after 8 hrs of running or so, it drops speed to 1sol. The has to be restarted... any info as to why? On Nov 16, 2016 9:06 PM, "maztheman" notifications@github.com wrote:
We're you able to over clock the card?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maztheman/nheqminer/issues/5#issuecomment-261142382, or mute the thread https://github.com/notifications/unsubscribe-auth/ARG0jJBmUHwS7tInHzcr_GYNBEsz0BXWks5q-8SagaJpZM4KxkRY .
Not really, is it possible that your GPU overheated or something?
@maztheman I ran the new debug tool but where is the log file saved? Is there a switch I need to use to write the log to a text? I did notice that it gave me @ 18 sol/sec on the v4 and 24 sol/sec on the v5 with the 580. But with nheqminer using the -cd 0 switch detects the 580 but I'm getting 0 sol/s using the newest 0.4i.
Yeah the debug tool has a little different code I was playing around with. To write the log add 1>run.txt to the end of your command line. Thst should write all the text to a file.
@maztheman Here ya go, I also included a screenshot of it in regular mode. run.txt
i see you using cuda_tromp for old and gtx 580 what have compute capability 2.0, so i suspect it was not compiled for your card (in sources i see cuda_tromp project set for 2.0+, but restriction on equi_miner.cu set for 5.0+), try use silentarmy option or wait for maztheman new files
Thanks @chronosek , maybe that will help @maztheman but I have no idea how to compile or change restrictions on equi_miner.cu from 5.0+to 2.0+
If you want try cuda_tromp you could give a try dll compiled by me http://chronx.pl/cuda/cuda_tromp_75.dll (it was for cuda8.0 but still should work with new drivers) but still even without atomic support silentarmy port should be faster http://chronx.pl/cuda/cuda_silentarmy.dll , just replace files and use good options
@chronosek that worked like a champ, getting @ 13 sol/sec which is better than the nothing I was getting before. Thanks man,
Only get about 45 sol/sec with R9 fury. According to silentarmy readme I expected to be getting at least the same 115 sol/s of an R9 Nano. https://github.com/mbevand/silentarmy/commit/503697cd03d68eef9f6824aef920b50999bef4f1?short_path=04c6e90#diff-04c6e90faac2675aa89e2176d2eec7d8
Discuss your results here