Closed tekbasse closed 8 years ago
Can you run it under gdb? e.g.
$ gdb minerd
and then once inside gdb, set the usual command line parameters via
(gdb) set args ....
(gdb) run
Will try that. Also, I noticed 4way isn't available to try, so I'll try rebuilding to include it.
(gdb) run Starting program: /root/cpuminer-master/minerd --algo cryptopp_asm32 --url http://site:8334 --userpass foo:bar --retry-pause 70 [Thread debugging using libthread_db enabled] [New Thread 0xb796cb70 (LWP 930)] [New Thread 0xb716bb70 (LWP 931)] [New Thread 0xb696ab70 (LWP 932)] [2013-04-03 12:34:54] Binding thread 0 to cpu 0 [New Thread 0xb6157b70 (LWP 933)] [2013-04-03 12:34:55] Binding thread 1 to cpu 1 [2013-04-03 12:34:55] Long-polling activated for http://site:8332/listenChannel [2013-04-03 12:34:56] 2 miner threads started, using SHA256 'cryptopp_asm32' algorithm. [2013-04-03 12:35:11] thread 0: 16777215 hashes, 1018.71 khash/sec [2013-04-03 12:35:12] thread 1: 16777215 hashes, 1026.35 khash/sec [2013-04-03 12:36:13] thread 1: 62914556 hashes, 1025.59 khash/sec [2013-04-03 12:36:16] thread 0: 62914556 hashes, 981.78 khash/sec [2013-04-03 12:37:13] thread 1: 61883169 hashes, 1028.97 khash/sec [2013-04-03 12:37:14] thread 0: 58982396 hashes, 1015.95 khash/sec [2013-04-03 12:38:14] thread 1: 61883169 hashes, 1027.88 khash/sec [2013-04-03 12:38:14] thread 0: 61016271 hashes, 1017.75 khash/sec [2013-04-03 12:39:14] thread 0: 61016271 hashes, 1018.13 khash/sec [2013-04-03 12:39:14] thread 1: 61883169 hashes, 1027.84 khash/sec
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb716bb70 (LWP 931)] 0x0805107c in string_get (data=0xb716b028) at load.c:791 791 c = stream->data[stream->pos]; (gdb)
Great! That's good output. After the SIGSEGV, and it returns to gdb control, could you provide one more piece of information, a back trace?
(gdb) bt
bitcoind is running on the same server without seg faults.
minerd re-built, now shows 4way. 4way works at 1/16 the performance rate of cryptopp_asm32. The default c algo works at 1/8 the performance of cryptopp_asm32. However, cryptopp_asm_32 is 20% slower after the re-build.. maybe due to CPU temp? Anyway, here is the segfault in a gdb run of the new build:
(gdb) run Starting program: /root/cpuminer-master/minerd --algo cryptopp_asm32 --url http://site:8332 --userpass foo:bar --retry-pause=70 [Thread debugging using libthread_db enabled] [New Thread 0xb796cb70 (LWP 5305)] [New Thread 0xb716bb70 (LWP 5306)] [New Thread 0xb676ab70 (LWP 5307)] [2013-04-03 13:09:07] Binding thread 0 to cpu 0 [2013-04-03 13:09:08] Long-polling activated for http://site:8332/listenChannel [New Thread 0xb5f69b70 (LWP 5308)] [2013-04-03 13:09:08] Binding thread 1 to cpu 1 [2013-04-03 13:09:09] 2 miner threads started, using SHA256 'cryptopp_asm32' algorithm. [2013-04-03 13:09:29] thread 0: 16777215 hashes, 806.02 khash/sec [2013-04-03 13:09:30] thread 1: 16777215 hashes, 802.47 khash/sec
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb716bb70 (LWP 5306)] 0x080a3bb6 in string_get () (gdb) [made miner edits to remove domain in parameter --bb 20130402]
Here's the gdb with bt:
(gdb) run Starting program: /root/cpuminer-master/minerd --algo cryptopp_asm32 --url http://site:8332 --userpass foo:bar --retry-pause=70 [Thread debugging using libthread_db enabled] [New Thread 0xb796cb70 (LWP 5329)] [New Thread 0xb716bb70 (LWP 5330)] [New Thread 0xb696ab70 (LWP 5331)] [2013-04-03 13:19:33] Binding thread 0 to cpu 0 [2013-04-03 13:19:33] Long-polling activated for http://site:8332/listenChannel [New Thread 0xb6157b70 (LWP 5332)] [2013-04-03 13:19:34] Binding thread 1 to cpu 1 [2013-04-03 13:19:35] 2 miner threads started, using SHA256 'cryptopp_asm32' algorithm. [2013-04-03 13:19:54] thread 0: 16777215 hashes, 807.29 khash/sec [2013-04-03 13:19:55] thread 1: 16777215 hashes, 796.32 khash/sec [2013-04-03 13:20:53] thread 0: 47934900 hashes, 810.07 khash/sec [2013-04-03 13:20:55] thread 1: 47934900 hashes, 799.22 khash/sec [2013-04-03 13:21:53] thread 0: 48747355 hashes, 809.85 khash/sec [2013-04-03 13:21:55] thread 1: 47934900 hashes, 797.48 khash/sec
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb716bb70 (LWP 5330)] 0x080a3bb6 in string_get () (gdb) bt
(gdb)
FWIW, three runs in gdb, and get the same result with 'bt' --in case that rules out some kind of math process. The third run lasted nearly 20mins before the segfault.
hmmm. At first glance, it sounds like possibly a software bug causing memory corruption. That is an odd place for a segfault.
Are you thinking a JSON mishandling of a server response or something? If it's something that you won't be able diagnose, would you be willing to help me get your miner assembly code working via tcl ie where tcl handles maintenance? tcl has all sorts of advantages for the nonmath part.. and maybe for generating alternate algorithms processes.
jgarzik, I'm guessing the segfaults are related to unexpected input from a pool server. minerd is connected to a pool where the server is obviously experiencing overload which has probably been increasing over the last week. Other users are fully affected now because the server is down. It seems that connections were probably dropped prematurely, or unexpected response content. If so, cpuminer should handle this by trying again according to an "askrate" or perhaps an exponentionally increasing delay with warning message.
Does cpuminer handle getBlockTemplate? https://en.bitcoin.it/wiki/Getblocktemplate
No. But it would be a nice, modernizing addition to cpuminer if it did.
Running the same instance of cpuminer reconfigured to a different, stable pool has resulted in no SIGSEGV errors over three hours. Given that it was SIGSEGVing every twenty minutes or less on the unstable server pretty much solidifies the theory that the error is the result of bad server responses.
closed - obsolete software - not actively supported or maintained.
Error occurs every few minutes, sporadically, regardless of using default algo or cryptopp_asm32.
System is PentiumD 3GHz stepping 04, 2cpu, 2048k L2 cache
Example error message in context:
[2013-04-03 11:13:40] thread 0: 62111506 hashes, 1018.48 khash/sec Segmentation fault
config.log shows /bin/arch etc = unknown
more info:
/var/log/messages shows lines with "segfault at 0 ip 0805107c sp bxxxxec4 error 4 in minerd[8048000+e000] where xxxx is hexidecimal number.