jgarzik / cpuminer

CPU miner for bitcoin
Other
952 stars 2.34k forks source link

Ubuntu Lucid LTS 10.04, cpuminer 1.02, sporadic segmentation faults #46

Closed tekbasse closed 8 years ago

tekbasse commented 11 years ago

Error occurs every few minutes, sporadically, regardless of using default algo or cryptopp_asm32.

System is PentiumD 3GHz stepping 04, 2cpu, 2048k L2 cache

Example error message in context:

[2013-04-03 11:13:40] thread 0: 62111506 hashes, 1018.48 khash/sec Segmentation fault

config.log shows /bin/arch etc = unknown

more info:

/var/log/messages shows lines with "segfault at 0 ip 0805107c sp bxxxxec4 error 4 in minerd[8048000+e000] where xxxx is hexidecimal number.

jgarzik commented 11 years ago

Can you run it under gdb? e.g.

$ gdb minerd

and then once inside gdb, set the usual command line parameters via

(gdb) set args ....
(gdb) run
tekbasse commented 11 years ago

Will try that. Also, I noticed 4way isn't available to try, so I'll try rebuilding to include it.

tekbasse commented 11 years ago

(gdb) run Starting program: /root/cpuminer-master/minerd --algo cryptopp_asm32 --url http://site:8334 --userpass foo:bar --retry-pause 70 [Thread debugging using libthread_db enabled] [New Thread 0xb796cb70 (LWP 930)] [New Thread 0xb716bb70 (LWP 931)] [New Thread 0xb696ab70 (LWP 932)] [2013-04-03 12:34:54] Binding thread 0 to cpu 0 [New Thread 0xb6157b70 (LWP 933)] [2013-04-03 12:34:55] Binding thread 1 to cpu 1 [2013-04-03 12:34:55] Long-polling activated for http://site:8332/listenChannel [2013-04-03 12:34:56] 2 miner threads started, using SHA256 'cryptopp_asm32' algorithm. [2013-04-03 12:35:11] thread 0: 16777215 hashes, 1018.71 khash/sec [2013-04-03 12:35:12] thread 1: 16777215 hashes, 1026.35 khash/sec [2013-04-03 12:36:13] thread 1: 62914556 hashes, 1025.59 khash/sec [2013-04-03 12:36:16] thread 0: 62914556 hashes, 981.78 khash/sec [2013-04-03 12:37:13] thread 1: 61883169 hashes, 1028.97 khash/sec [2013-04-03 12:37:14] thread 0: 58982396 hashes, 1015.95 khash/sec [2013-04-03 12:38:14] thread 1: 61883169 hashes, 1027.88 khash/sec [2013-04-03 12:38:14] thread 0: 61016271 hashes, 1017.75 khash/sec [2013-04-03 12:39:14] thread 0: 61016271 hashes, 1018.13 khash/sec [2013-04-03 12:39:14] thread 1: 61883169 hashes, 1027.84 khash/sec

Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb716bb70 (LWP 931)] 0x0805107c in string_get (data=0xb716b028) at load.c:791 791 c = stream->data[stream->pos]; (gdb)

jgarzik commented 11 years ago

Great! That's good output. After the SIGSEGV, and it returns to gdb control, could you provide one more piece of information, a back trace?

(gdb) bt
tekbasse commented 11 years ago

bitcoind is running on the same server without seg faults.

minerd re-built, now shows 4way. 4way works at 1/16 the performance rate of cryptopp_asm32. The default c algo works at 1/8 the performance of cryptopp_asm32. However, cryptopp_asm_32 is 20% slower after the re-build.. maybe due to CPU temp? Anyway, here is the segfault in a gdb run of the new build:

(gdb) run Starting program: /root/cpuminer-master/minerd --algo cryptopp_asm32 --url http://site:8332 --userpass foo:bar --retry-pause=70 [Thread debugging using libthread_db enabled] [New Thread 0xb796cb70 (LWP 5305)] [New Thread 0xb716bb70 (LWP 5306)] [New Thread 0xb676ab70 (LWP 5307)] [2013-04-03 13:09:07] Binding thread 0 to cpu 0 [2013-04-03 13:09:08] Long-polling activated for http://site:8332/listenChannel [New Thread 0xb5f69b70 (LWP 5308)] [2013-04-03 13:09:08] Binding thread 1 to cpu 1 [2013-04-03 13:09:09] 2 miner threads started, using SHA256 'cryptopp_asm32' algorithm. [2013-04-03 13:09:29] thread 0: 16777215 hashes, 806.02 khash/sec [2013-04-03 13:09:30] thread 1: 16777215 hashes, 802.47 khash/sec

Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb716bb70 (LWP 5306)] 0x080a3bb6 in string_get () (gdb) [made miner edits to remove domain in parameter --bb 20130402]

tekbasse commented 11 years ago

Here's the gdb with bt:

(gdb) run Starting program: /root/cpuminer-master/minerd --algo cryptopp_asm32 --url http://site:8332 --userpass foo:bar --retry-pause=70 [Thread debugging using libthread_db enabled] [New Thread 0xb796cb70 (LWP 5329)] [New Thread 0xb716bb70 (LWP 5330)] [New Thread 0xb696ab70 (LWP 5331)] [2013-04-03 13:19:33] Binding thread 0 to cpu 0 [2013-04-03 13:19:33] Long-polling activated for http://site:8332/listenChannel [New Thread 0xb6157b70 (LWP 5332)] [2013-04-03 13:19:34] Binding thread 1 to cpu 1 [2013-04-03 13:19:35] 2 miner threads started, using SHA256 'cryptopp_asm32' algorithm. [2013-04-03 13:19:54] thread 0: 16777215 hashes, 807.29 khash/sec [2013-04-03 13:19:55] thread 1: 16777215 hashes, 796.32 khash/sec [2013-04-03 13:20:53] thread 0: 47934900 hashes, 810.07 khash/sec [2013-04-03 13:20:55] thread 1: 47934900 hashes, 799.22 khash/sec [2013-04-03 13:21:53] thread 0: 48747355 hashes, 809.85 khash/sec [2013-04-03 13:21:55] thread 1: 47934900 hashes, 797.48 khash/sec

Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb716bb70 (LWP 5330)] 0x080a3bb6 in string_get () (gdb) bt

0 0x080a3bb6 in string_get ()

1 0x080a25b2 in stream_get ()

2 0x080a27b5 in lex_get ()

3 0x080a33ba in lex_scan ()

4 0x080a3b51 in parse_json ()

5 0x080a3c57 in json_loads ()

6 0x0804be47 in json_rpc_call ()

7 0x0804a6b1 in longpoll_thread ()

8 0xb7f8196e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0

9 0xb7eef3fe in clone () from /lib/tls/i686/cmov/libc.so.6

(gdb)

tekbasse commented 11 years ago

FWIW, three runs in gdb, and get the same result with 'bt' --in case that rules out some kind of math process. The third run lasted nearly 20mins before the segfault.

jgarzik commented 11 years ago

hmmm. At first glance, it sounds like possibly a software bug causing memory corruption. That is an odd place for a segfault.

tekbasse commented 11 years ago

Are you thinking a JSON mishandling of a server response or something? If it's something that you won't be able diagnose, would you be willing to help me get your miner assembly code working via tcl ie where tcl handles maintenance? tcl has all sorts of advantages for the nonmath part.. and maybe for generating alternate algorithms processes.

tekbasse commented 11 years ago

jgarzik, I'm guessing the segfaults are related to unexpected input from a pool server. minerd is connected to a pool where the server is obviously experiencing overload which has probably been increasing over the last week. Other users are fully affected now because the server is down. It seems that connections were probably dropped prematurely, or unexpected response content. If so, cpuminer should handle this by trying again according to an "askrate" or perhaps an exponentionally increasing delay with warning message.

tekbasse commented 11 years ago

Does cpuminer handle getBlockTemplate? https://en.bitcoin.it/wiki/Getblocktemplate

jgarzik commented 11 years ago

No. But it would be a nice, modernizing addition to cpuminer if it did.

tekbasse commented 11 years ago

Running the same instance of cpuminer reconfigured to a different, stable pool has resulted in no SIGSEGV errors over three hours. Given that it was SIGSEGVing every twenty minutes or less on the unstable server pretty much solidifies the theory that the error is the result of bad server responses.

jgarzik commented 8 years ago

closed - obsolete software - not actively supported or maintained.