Closed mfiano closed 10 years ago
I forgot to mention, using the latest commit from the v5_0 branch at the time of writing.
Thanks for the detailed instructions! However I cannot reproduce this. I added 2 pools just like you, and same pool-* settings like you have (different values of course).
Which cards do you have? I have R9 280X. Also from your settings I don't understand why you have such a big difference between intensity for Scrypt and Scrypt-N? I use the same value normally...
I think I see one part of the problem.. Try moving pool-nfactor to be always below pool-algorithm. However we should still find out why it's segfaulting... Definitely first try to use lower intensity on scrypt-n.
"This only occurs on my R9 270 rig (Pitcairn)...other rigs with other types of cards work fine."
They are Sapphire 270's. The intensities were typos. They are both 19 in my config. The problem is it segfaults at "Building zuikkisPitcairn...". It even throws an error that the GPU was idle for 60 seconds and declaring SICK...but it hasn't even been running for 60 seconds...this is as soon as it atempts to build the kernel (the file never even shows up)
Moving pool-nfactor below pool-algorithm has no effect. As soon as it tries building nf11 binary, it segfaults. Happens everytime with my Pitcairn GPU rigs. The only fix is to only use one or the other...just scrypt or scrypt-n. If you delete the nf10 binary, nf11 can generate without segfault. If you delete the nf11 binary, nf10 can generate without segfault. Both cannot exist on disk simultaneously.
The bigger nfactor is asking for more memory to be allocated 2032271360 bytes. Try lowering the TC and see if the problem don't go away.
@mfiano I saw this before "It even throws an error that the GPU was idle for 60 seconds and declaring SICK". I think this is the actual bug and what is causing the segfault. It tries to restart GPU while it's not yet even initialized (compiling kernel) and that's why it crashes.
TC has no effect. As mentioned, it builds the kernel fine if the other kernel doesn't exist on disk. Either the bigger memory or smaller memory kernel..it doesn't matter. If one of them exists already, boom.
I am 80% sure that this "kernel with other n-factor already exists" has nothing to do with it.
@mrbrdo It is not random whatsoever. It only occurs when it exists. I can kill sgminer and start it again and if one of them exists, it will segfault if it tries building the other one. This is reproducable 100% of the time (tried about 50 times). In the control, if no file exists, either kernel will build fine.
I can't reproduce it. But I am working on fixing the "declaring SICK" bug and you can try that after I'm done.
Try the updated v5_0 branch. If it still crashes please run with -T -D
and show me the last 10-20 lines of output before it crashes.
Ok will do
@mrbrdo This didn't fix the problem. Here are the logs: With debug: http://pastebin.com/7PqnLmeR Without debug: http://pastebin.com/g5wjCiYn
Can you show me your full config (not only pools)? You can remove user/password but leave everything else please.
Removing pool-gpu-threads
doesn't change anything, right? I have an idea what I can check also, will do that now. If you can, hop on IRC (Freenode, #sgminer-dev), will make this faster.
Right, I recently added that as a test
Sorry for asking, but is it possible to get a log when there is no pool-gpu-threads
? Particularly I'm interested if it still becomes SICK after the last changes I made. There is a difference how threads are restarted when there is pool-gpu-threads (more "hardcore" restart), and I see how the SICK could happen in this case, but not the other one.
This issue has been fixed thanks to @mrbrdo . Changes commited.
I just spent the last 4 hours debugging this problem, and I think I've narrowed it down enough to give a bug report:
The problem: When building a kernel, sgminer will segfault if a kernel with a different nf value already exists. For example, if "zuikkisPitcairnglg2tc15505nf10w256l4.bin" already exists, then building the kernel for pool 2 with nfactor 11 will crash sgminer. This will occur if 11 exists and 10 is attempted to be built as well. Additionally, it will also crash if mix and matching kernels - for example, if zuikkis with nf10 exists, building bufius with nf11 will crash it.
What I have discovered:
The setup: Using pool-specific GPU settings, I have 2 pools - 1 for Scrypt and 1 for Scrypt-N as follows: { "name" : "Pool 1", "url" : "xxx", "user" : "xxx", "pass" : "xxx", "pool-nfactor" : "10", "pool-algorithm" : "zuikkis", "pool-intensity" : "13", "pool-gpu-engine" : "1070", "pool-thread-concurrency" : "15505" }, { "name" : "Pool 2", "url" : "xxx", "user" : "xxx", "pass" : "xxx", "pool-nfactor" : "11", "pool-algorithm" : "zuikkis", "pool-intensity" : "18", "pool-gpu-engine" : "1100", "pool-thread-concurrency" : "15505" }, {