sgminer-dev / sgminer

Scrypt GPU miner
GNU General Public License v3.0
631 stars 825 forks source link

Significant drop in x11 hashrate performance as compared to X15GPU miner #347

Closed platinum4 closed 10 years ago

platinum4 commented 10 years ago

Comparing the most recent builds of sgminer5 (07-12 personally) versus previous builds (06-25), I have noticed a significant drop in hashrates on the x11 algorithm, which can be viewed in this possibly related issue https://github.com/sgminer-dev/sgminer/issues/330

Taking a look at the two pictures posted, the sgminer5 seems to build and hash okay but it drops performance across the cards, and they are by no means synchronized. The second picture is a run from the 06-25 build, and the hashrates do not experience a loss.

@mrbrdo can you please chime in and see if this a p|thread related issue, I've already exhaused @ystarnaud who's compared most everything else, and cannot spot what may be going on.

@kenshirothefist, the source for 06-25 was pulled from your sgminer multi-algo software page. Is it possible that you could supply a link to that source once more, so we can compare across commits to see what has happened in the last 3 weeks?

platinum4 commented 10 years ago

Deleting bins, rebuilding bins, changing through every possible darkcoin-mod.cl file available, nothing seems to restore hashrates on the newer builds. This is NOT driver related, as I can replicate the maximum hashrates by deleting bins and rebuilding with 14.7RC on the 06-25 prerelease build.

troky commented 10 years ago

@platinum4 Have you tried to use bins compiled on other (good) rig?

I have one rig that just can't compile good (with normal hashrate) bins so I always use other rigs' bins there.

platinum4 commented 10 years ago

@troky yep me and @ystarnaud ran through pretty much every swap and substitution imaginable over here in IRC, no dice.

Even directly pasting the bins into the directory of the newest build and just going from there doesn't even bring them back up. I'm on 290X should be getting a minimum of 5.4MH, even with pasted [good] bins and .cl files, 2 cards don't get above 5MH

platinum4 commented 10 years ago

This issue seems to affect other x[n] algos as it translates on up the chain.

Please can we take a whole-hearted look at this issue; we would not want to bury performance loss in the dust of all other future developments. Sometimes; it is essential to look at things from a Square One perspective.

Not driver-related, not .cl-file related, not .bin-file related; this has to do with how sgminer is now handling threads. Builds from 06-25 did not do this behavior, and I must have overlooked this issue when we decided to start a develop tree and a feature-lock tree.

Restore the hashrate to all devices during a mining instance; this is what must happen.

platinum4 commented 10 years ago

@troky @mrbrdo @ystarnaud

Can we investigate these two sources, which are dated from 06-25-2014?

https://github.com/sgminer-dev/sgminer/archive/78014ab0d53c661c8d3acd4184e3ca81e802896c.zip

https://github.com/sgminer-dev/sgminer/archive/044bf709018d8509cad7bfb758670f087f924980.zip

badman74 commented 10 years ago

have you checked to see if the clocks on all of the cards are actually the same while running i have run into times when the mem clocks did not change or even worse crashed to 150 on a single card

platinum4 commented 10 years ago

Yeah, the top two cards are at a LOWER clockrate than the bottom card, which does NOT account for its drop in hashrate...

platinum4 commented 10 years ago

I think this may be a readout lag on ncurses, now that I'm observing closely; will report in.

platinum4 commented 10 years ago

@badman74 now you see what I was talking about after talking with bullus over in bitcointalk, right? Way different hashrates with X15GPU/X15_AMD than this one, huh? ;D

platinum4 commented 10 years ago

@badman74, I found a few culprits; help me out and see if you can replicate similar hashrates:

https://bitcointalk.org/index.php?topic=632503.msg7889004#msg7889004

badman74 commented 10 years ago

what i found was replacing

#define PERM_BIG_P(a)   do { \
    int r; \
    for (r = 0; r < 14; r += 2) { \
      ROUND_BIG_P(a, r + 0); \
      ROUND_BIG_P(a, r + 1); \
    } \
  } while (0)

#define PERM_BIG_Q(a)   do { \
    int r; \
    for (r = 0; r < 14; r += 2) { \
      ROUND_BIG_Q(a, r + 0); \
      ROUND_BIG_Q(a, r + 1); \
    } \
  } while (0)

in groestl.cl with

#define PERM_BIG_P(a)   do { \
    int r; \
    for (r = 0; r < 14; r ++) { \
      ROUND_BIG_P(a, r); \
    } \
  } while (0)

#define PERM_BIG_Q(a)   do { \
    int r; \
    for (r = 0; r < 14; r++) { \
      ROUND_BIG_Q(a, r); \
    } \
  } while (0)

then changing #define SPH_LUFFA_PARALLEL 0 and //#include "aes_helper.cl" to #define SPH_LUFFA_PARALLEL 1 and #include "aes_helper.cl" gives me 6.04mh/s on sapphire 290x with 1040/1500 OC and 15% powertune when it was 5.5mh/s normally unfortunately i really have no idea what that does.... it was just pulled out of the darkcoin-mod.cl from https://github.com/aznboy84/sgminer/tree/v5_0-x15 edit: i just put my 7750's back in service and it seems that #define SPH_LUFFA_PARALLEL 1 doesn't do anything for them

ystarnaud commented 10 years ago

Interesting. Maybe this is 290 specific as I haven't really noticed a performance drop on R9 270/x or 78XX/79XX cards.

aes_helper.cl was commented out because nothing touches it. it just adds bloat to the kernel.

SPH_LUFFA_PARALLEL 1 makes use of a different way to calculate the luffa hash that depends on how the GPU processes the instructions. It may not work on all the GPUs. Setting this to 0 may have worked better on GPUs tested other than R9 290 and that is why it was changed.

From what I know of OpenCL, your changes to groestl would undo a GPU optimization. Again, maybe this is only the case for all non-Hawaii cards.

According to AMD specification and OpenCL, all southern island cards (7XXX and R9 series) should behave the same but again and again, the Hawaii chipset (R9 290/290x) seems to behave very differently than the rest of the series...

I'll run some tests on my 7XXX/R9 270 to see if the above changes really make a difference. We might end up needing to specify extra kernel compiler options to get this to work out for everybody.

Thanks for your work in testing these various parts of X11 on the 290s. I would have myself but I never bought those cards based on their lower cost effectiveness and issues.

platinum4 commented 10 years ago

@ystarnaud can you make luffa_parallel a definable option for us like you did with hamsi_expand_big ?

Also, I noticed that pulling a groestl.cl from X15_AMD @aznboy84 miner was smaller (65kb) than ours (67kb), and idk if it did anything, but it seems to.

Can you check this for a sample of changes and configs? https://bitcointalk.org/index.php?topic=632503.msg7913153#msg7913153

Also check the previous two postings in that thread from screenies of some all-star performance hashing at insane overclocks.

What we are ALL wondering is... WHY does @aznboy84's darkcoin-modHawaiigw64l4.bin yield a reliable +200Kb, and we can't ever seem to build a comparable one? I can get 5.75 steady with ours (2.0MB), but if want to shoot for the moon, must go back and use his (1.96MB). You and I have already compared the darkcoin-mod.cl files, all five or six versions of them...

badman74 commented 10 years ago

for me the SPH_LUFFA_PARALLEL 1 gave about 100kh/s, and after checking again i see that you are correct and aes_helper.cl isn't doing anything so that just leaves the change i made at the end of groestl.cl that causes the change in speed

troky commented 10 years ago

I can confirm +100kh/s with #define SPH_LUFFA_PARALLEL 1 on 290

ystarnaud commented 10 years ago

@platinum4 yeah that's what I was thinking. I'll work something out later today when I have some free time.

ystarnaud commented 10 years ago

Oh and I'm guessing this will also affect X13/14/15 since they use these algorithms as part of their hash.

badman74 commented 10 years ago

after looking at this couldn't we use the SPH_KECCAK_UNROLL, SPH_LUFFA_PARALLEL, and SPH_HAMSI_EXPAND_BIG instead of the -mod.cl and -modold.cl kernels or am i missing some other optimizations in them

mrbrdo commented 10 years ago

@badman74 not sure why, from my limited knowledge of the kernels the *-modold ones have the last 3 opencl kernels combined into a single one. Don't know what the reason is exactly, but it's not just changing those constants. Unless you are saying that the reason the "-mod" kernels don't work on some cards is because of those 3 constants' values.

badman74 commented 10 years ago

the main thing i used to recover the lost has was the change in groestl.cl i don't know if this is the same across all cards

ystarnaud commented 10 years ago

See c603cec762454ab9f828f36dc6d67f6a1208768f and #358.

ystarnaud commented 10 years ago

@mrbrdo I don't know the specifics but I believe lower end GPUs in the 6xxx line or older didn't have enough compute units to process more than 10 kernel objects (if not less). The extra rounds of algorithm are packed into that last kernel so that they (at least some) will be able to compute the hash.

Another thing is the values suggested above aren't just constants. Depending on the values, the kernel .cl files will process differently or unroll loops differently to offer better optimization with #ifndef #else #endif type programming. Setting these values directly in the .cl file might result in problems with the various GPUs out there while optimizing others. This is why I added fine tuning options.

mrbrdo commented 10 years ago

@ystarnaud good work. Could you also call append_x11_compiler_options from append_x13_compiler_options instead of current code duplication? It's always nice to not have to change things at multiple places later.

ystarnaud commented 10 years ago

Sure... I didn't think about that...

ystarnaud commented 10 years ago

@mrbrdo done.

platinum4 commented 10 years ago

I'd say this was solved by the replacement of the last two loops in groestl.cl found here https://github.com/sgminer-dev/sgminer/blob/c603cec762454ab9f828f36dc6d67f6a1208768f/kernel/groestl.cl; the other enhancements/additions found with luffa-parallel, black-compact, and keccak-unroll came about as ancillary and are definitely beneficial to gaining that extra bit of hashrate.

The main objective of this issue has been effectively solved now by https://github.com/sgminer-dev/sgminer/commit/c603cec762454ab9f828f36dc6d67f6a1208768f