sgminer-dev / sgminer

Scrypt GPU miner
GNU General Public License v3.0
633 stars 825 forks source link

X11 on nicehash multiport now throws HW errors about 6/minute #393

Closed platinum4 closed 9 years ago

platinum4 commented 10 years ago

Not sure what was changed but drag-dropping the same sgminer.conf & profiles.conf into sgminer5.1-dev now results in HW errors on X11 with the same settings as previously used on badman's sgminer5 (4.2.2-301) modded build.

http://gyazo.com/b8676c585b16b97ece5a185ed0f97206

ystarnaud commented 10 years ago

I'm not sure... All I did was copy the dark coin-mod.cl file from badman74 so nothing should have changed outside that code. I assume you deleted your bins prior to running it? Do you have a pastebin of your config so I can run tests? I ran this on a 7970 card windows 7 64bit with 14.6 beta drivers when testing and I didn't see any HWs.

platinum4 commented 10 years ago

Ah, see I'm on those 14.9.2 betas. Remember I'm on R9 290X at 1040/1500; no GPU settings were changed. I can get 165Kh/s per card in neoscrypt.

sgminer.conf http://pastebin.com/wYgprmLC profiles.conf http://pastebin.com/4kiZHYLV http://gyazo.com/091e3b09eabb9fbf7825948358543105

platinum4 commented 10 years ago

I don't get it, I have 2 rigs on 14.9.2 now not showing any hardware errors, and one on 14.6RC2 and of course no hardware errors there.

ystarnaud commented 10 years ago

I'll run further tests. I don't get it either. It's the same .cl as badman74 and the core opencl code to enqueue the kernel didn't really change so there should be no differences. The only other thing I can think of is that the neoscrypt changes caused something not to calculate properly somewhere resulting in random HW.

ystarnaud commented 10 years ago

I just realized that when I tested, I was hitting x13 and x15 in the multipool. I forced x11 and I'm also getting HW now. I guess I'll take a closer look at the .cl changes.

platinum4 commented 10 years ago

Good deal! Yeah a lot of people in the bitcointalk thread are seeing this happen too; x11, x13, x15 as well. 14.6RC2 went back to hard-freezing my rigs so I guess I can deal with HW errors as long as it stays on.

mrbrdo commented 10 years ago

As far as I can remember, I had much better stability on the old 14.4 drivers than I did with the new ones. Even just using kernel bins build with 14.6 made stability worse for me (especially individual cards dying). Too bad that hashrate difference is so big between the two, otherwise I would just stick with 14.4.

platinum4 commented 10 years ago

What gets me is 14.9.2 actually restored the lost hashrate in 14.9 WHQL, but then immediately in switching over to this neoscrypt miner the HW errors occurred. I'm wondering if this is with nicehash only; because I did notice HW errors on a keccak rental I had a day or 2 ago with the 5.1.0-dev, and immediately if I loaded up badman's 4.2.2-301, it worked just fine as it had for months.

platinum4 commented 10 years ago

Well 14.6RC2 gets me 190Kh/s on neoscrypt and the best rates for x[n] algos, gonna stick with those. HW errors still show up sporadic. http://gyazo.com/d50c71b8ba757031adb3e26cf5ade2c3

platinum4 commented 10 years ago

What the fuck is this random shit post about. There's plenty of info over at https://bitcointalk.org/index.php?topic=632503

ystarnaud commented 10 years ago

Yeah this gets more confusing... So I was away all day didn't get to test anything. I get back and start my debug session to see if i can fix this. Now I get 0 HW on X11... I wonder if it's something that happens after an algo switch.

platinum4 commented 10 years ago

I'm on same testbed as you now, 14.6RC2 Win7 64-bit all rigs. 0 HW 14.9.2 has worse hashrate overall and yields HW I wonder if it was on nicehash side?

On Sat, Nov 8, 2014 at 8:48 PM, ystarnaud notifications@github.com wrote:

Yeah this gets more confusing... So I was away all day didn't get to test anything. I get back and start my debug session to see if i can fix this. Now I get 0 HW on X11... I wonder if it's something that happens after an algo switch.

— Reply to this email directly or view it on GitHub https://github.com/sgminer-dev/sgminer/issues/393#issuecomment-62288596.

ystarnaud commented 10 years ago

Ok I'll get 14.9.2 dlls and get testing on that. Thanks for pointing that out. I contacted nicehash and asked if maybe they patched some code in the past 24 hrs that might have fixed the problem with maybe stratum or something like that.

platinum4 commented 10 years ago

That's honestly what I think happened; elbandi was working on that side of the code, and I've noticed less HW errors in general over the past 24. Could be the drivers, not sure. Right now 14.6RC2 is king for hashrate, but not stability.

On Sat, Nov 8, 2014 at 9:00 PM, ystarnaud notifications@github.com wrote:

Ok I'll get 14.9.2 dlls and get testing on that. Thanks for pointing that out. I contacted nicehash and asked if maybe they patched some code in the past 24 hrs that might have fixed the problem with maybe stratum or something like that.

— Reply to this email directly or view it on GitHub https://github.com/sgminer-dev/sgminer/issues/393#issuecomment-62288821.

platinum4 commented 10 years ago

Actually this is fucking bullshit, now 14.6RC2 is throwing HW errors about every 10 seconds too on x13 on nicehash

ystarnaud commented 10 years ago

See fa9eb196b575ef2f203d7adb64cdb0acd084f1dd Can you guys with HW errors test this?

My test build had this part corrected and I didn't realize I had pushed the bad file up. Might be why I can't find HWs. Turns out the other instances of me getting HW was due to bad bins being in the wrong folders.

platinum4 commented 9 years ago

ystarnaud is a fucking king. Closed. ty. miss y'all on the IRC ;D