Closed girino closed 10 years ago
I will add it when I merge new changes from lazybear, is that ok?
Btw @girino I've seen different values for SPH_HAMSI_EXPAND_BIG in the X13-mod kernel (sometimes 1 and sometimes 4). What is the effect of this, what should it be?
I added your copyright attribution so I'm closing this. Please let me know about the above question though, thanks!
Thanks for the attribution!
Answering your question about the SPH_HAMSI_EXPAND_BIG: The code in sph-sgminer, and consequently in the mods, is based on sphlib (http://www.saphir2.com/sphlib/), and ported to opencl. sphlib is a crypto library wich has portability as its main goal, and for that they have lots of "fine tunning" that can be applied to the code. In the case of Hamsi algorithm, there are two main tunnings available: small vs large footprint (with small meaning more processor usage and large meaning more memory used), and the number of table lookups. SPH_HAMSI_EXPAND_BIG=1 means that we use large lookup tables (less processor usage) and make only onde lookup per cicle of the algorithm. SPH_HAMSI_EXPAND_BIG=4 means also large tables, but with 4 lookups. More lookups are better for architectures with better random access to memory, less lookups are better for architectures with slow random access but large caches.
Finding out the best adjustment is tricky, and will vary from system to system. AFAIK changing from 1 to 4 has gained little more than 50 KH/s (about 2%) on the r9 boards. But might be slower on older ones. (there are several other valid values for that, but it is really time consuming to test all of them).
Hope this answers your question. girino.
Thanks, sound like something we might want to provide an option in the config for at some point.
I'd leave that as it is. since it's just on the opencl code, that doesn't need to be pre-compiled, and that only really advanced users would want to change that, i'd leave it as it is. Those willing to change the default settings can open the opencl file and change the settings.
I guess that's true. Thanks for your help!
Can confirm that for right now, for best hashrate when OC'd EXPAND_BIG 1 is preferred for Hawaii chipsets.
EXPAND_BIG 4 most likely for Tahiti & Pitcairn
@girino Hm, on my R9 280X, I get 100 KH/s less with EXPAND_BIG=4, compared to EXPAND_BIG=1. Weird?
not really, it depends on the several memory speeds (global, local and private memspaces). That should change from one manufacturer to the other.
Ah so it is not only dependent on architecture (Tahiti, Hawaii etc)? Same chip can behave differently between Sapphire, ASUS etc.?
Yes because they use different memory chips, and difference between EXPAND_BIG=4 and EXPAND_BIG=1 is how they access memory. it might also depend on how much memory OC you are capable of.
Interesting. So maybe the manufacturer of memory also matters (Hynix or Elpida)?
Now you got me! If the memory has the same specs and same clock, it should behave the same. But i'm not an expert in electronics so i wont risk saying anything on this ground :)
Cool, thanks for the info and for the awesome kernel hack. Hopefully we can solve the freezing issue some people are having. :) I don't have any stability problems, though.
what exactly is this freezing problem? It might happen when resources from the GPU are not properly deallocated. Since the mod uses several kernels, you should check that all those kernels are "clReleaseKernel" when the program shuts down (same for memory that is allocated).
Also, I had some issues when i allocated the ammount of memory based on the intensity i was using, so I reused the code that allocates buffers for scrypt, and only increment it if it is smaller than what i need for the selected intensity:
https://github.com/girino/x13-sgminer/blob/master/ocl.c#L789-L795
Hope one of those hacks can help you with the freezing.
Yeah the freeze seems to occur while mining, so it's not a problem with releasing the kernels/buffers (but yeah, we do that).
For the the buffer we use a hardcoded value at the moment (8 * 16 * 4194304, I forgot which fork I got that from). But lasybear's fork does some calculation based on intensity, and people have the same hangs there so this is probably not the problem. Also if buffer was of wrong size it would probably not work from the beginning instead of freezing at some point hours after starting.
Why do you use a larger buffer (from scrypt calculation) and not always just calculate size based on intensity? You had problems that way? Maybe it was related to memory alignment. Come to think of it, (1024 % cgpu->lookup_gap > 0)
might be there exactly because of alignment (basically I guess it allocates an oversized buffer if 1024 is not divisible by lookup gap). Of course lookup gap only matters for Scrypt, but it seems there are some alignment conditions.
At the time i didn't investigate why it solved the problem i was having (the opencl code would seg fault with small intensity values on some gpus), but your interpretation makes sense. because of memory alignment the opencl program might address an unallocated portion of memory and this would cause the segfault. maybe the original author of cgminer would know the reason?
Hehe I think he is not interested in GPUs anymore :)
i found the commit where this change was made, but the comment is useless to figure out why he did it...
Yeah, doesn't say much. I think that could actually be done just because the division will not always give a integer number. For example 1024 / 3 = 341.333, so he says 341 + 1. OpenCL docs don't seem to mention any alignment requirement except for CL_MEM_USE_HOST_PTR but that is not used here. So I think I was on the wrong track about that :)
you have used code based on my miner without the copyright attribution:
https://github.com/sgminer-dev/sgminer/blob/v5_0/kernel/darkcoin-mod.cl
This file is based on my code and needs the line:
added to it after or before with the line that says:
You probably based your code in lazybear's before he fixed it. I thank you in advance for correcting this attribution mistake. girino.