Closed mrbrdo closed 10 years ago
I often see Failed to parse a \n terminated string in recv_line
in logs but I don't think it is related because I saw it long time before crashes occurred.
https://github.com/sgminer-dev/sgminer/commit/e33590f37d060cebd7ef5d7c1f929926fbe21e29 could possibly be to blame since it is present in all 4 of the versions I was running, and I didn't have these segfaults before it. But it could be just random.
I updated the implementation a bit so I avoid alloc at all, I will run one rig with this to see if that was causing the issue.
Oh right, it couldn't have been that because miner4's version did a soft threads reset so kill_mining wasn't even called. Damn, it's something else. Running all my miners with gdb now.
So far so good with the latest build. No crashes. All rigs @ 14.6 RC.
I am still not convinced this is sgminer problem. Keep in mind that we are using beta/rc drivers in weird combinations on OC'ed cards :) For example, I don't have any issues with 13.12 @ stock clocks and native kernels... but I just don't like the performance.
Running one rig in VS debugger, too...
Well since the segfaults are not coming from AMD drivers I think it is a sgminer issue. Since all my rigs crashed basically within 60 seconds of each other, I think it was something the pools sent. Also I am using the 14.6 bins only on half of my rigs but all crashed.
Oh, I didn't realize you had a synced crash. I suppose we have different type of crashes then because my crashes are/were random.
Also, running in gdb was a disaster, after a few switches some GPUs simply stopped hashing (on some rigs all GPUs on some rigs only some, even on non-14.6 bins). Trying to quit then resulted in defunct sgminer so I had to reboot. :/ I wonder what makes the GPUs hang like that. I can run in valgrind, though.
Found/fixed one https://github.com/sgminer-dev/sgminer/issues/334
So after a week or two I only seem to be having issues on one of four rigs, not sure if maybe one of the cards is broken or something. It keeps freezing every few days.
I moved my problematic rig to Linux to see if problem persists... It looks like some of cards doesn't like particular engine/memory clock combinations. After few days of fine tuning I've managed to achieve (for me) good uptime... Watchdog restarts rig every few days because of cards stop hashing. However, weird thing is that I get 10-15% lower hashrates compared to Windows with the same config (and kernels).
Just today one of my rigs which was running fine for a week or more got a frozen sgminer. I attempted to restart as usual but it seems it did not reboot. :/ I am not getting much segfaults anymore but it seems this freezes are more common now. Not sure if it is caused by algo switching or by the implementation of the algos itself, since I know there were some stability issues with X11-mod at the beginning.
EDIT: So today I restarted both of the frozen rigs manually and they are working fine again. I am waiting for 14.6 final drivers, hopefully they will help with the freezing.
Here is a debug log of a rig where sgminer froze: https://gist.github.com/mrbrdo/3eeba880038bea51d604
I believe the segfaulting issue was fixed, I am not getting them anymore, only freezes. So I am closing this.
I got segfaults on all 4 of my rigs within 60 seconds of each other. The first one seems to have a different one than the rest. debug logs: https://gist.github.com/mrbrdo/12be345923aad9b189d7
The rigs are running different sgminer versions:
Not sure if relevant but miner2-4 all have
Failed to parse a \n terminated string in recv_line
shortly before the crash.