luke-jr / bfgminer

Modular ASIC/FPGA miner written in C, featuring overclocking, monitoring, fan speed control and remote interface capabilities.
http://luke.dashjr.org/programs/bitcoin/files/bfgminer/
Other
1.83k stars 810 forks source link

Bfgminer stops mining after a while #360

Open davidmurray opened 10 years ago

davidmurray commented 10 years ago

Hi.

In the recent weeks I have experienced weird behaviour. After some time (can go from around 8 hours to a few days) of mining, it suddenly stops submitting shares to the pool (mining.bitcoin.cz)

Mining hardware: Butterfly labs BitForce 30GH/s Raspberry Pi Model B running Raspbian

Here is the log:

bfgminer version 3.9.0 - Started: [2013-12-27 10:12:47] - [ 1 day 01:30:03] [M]anage devices [P]ool management [S]ettings [D]isplay options [H]elp [Q]uit Connected to stratum.bitcoin.cz diff 1 with stratum as user [hidden] Block: ...0a4e3664 #277425 Diff:1.18G ( 8.45Ph/s) Started: [11:25:51] ST:4 F:0 NB:175 AS:0 BW:[ 43/ 23 B/s] E:145.70 I: 363uBTC/hr BS:232k 0 66.0C | 0.00/20.69/20.63Gh/s | A:16884 R:57+0(.34%) HW:51/.01% BFL 0: 66.0C | SICK /20.69/20.63Gh/s | A:16884 R:57+0(.34%) HW:51/.01% [2013-12-28 11:26:22] Stratum from pool 0 requested work update [2013-12-28 11:26:52] Stratum from pool 0 requested work update [2013-12-28 11:27:21] Stratum from pool 0 requested work update [2013-12-28 11:27:52] Stratum from pool 0 requested work update [2013-12-28 11:28:22] Stratum from pool 0 requested work update [2013-12-28 11:28:51] Stratum from pool 0 requested work update [2013-12-28 11:29:21] Stratum from pool 0 requested work update [2013-12-28 11:29:52] Stratum from pool 0 requested work update [2013-12-28 11:30:22] Stratum from pool 0 requested work update [2013-12-28 11:30:51] Stratum from pool 0 requested work update [2013-12-28 11:31:21] Stratum from pool 0 requested work update [2013-12-28 11:31:52] Stratum from pool 0 requested work update [2013-12-28 11:32:22] Stratum from pool 0 requested work update [2013-12-28 11:32:52] Stratum from pool 0 requested work update [2013-12-28 11:33:22] Stratum from pool 0 requested work update [2013-12-28 11:33:52] Stratum from pool 0 requested work update [2013-12-28 11:34:22] Stratum from pool 0 requested work update [2013-12-28 11:34:52] Stratum from pool 0 requested work update [2013-12-28 11:35:22] Stratum from pool 0 requested work update [2013-12-28 11:35:52] Stratum from pool 0 requested work update [2013-12-28 11:36:22] Stratum from pool 0 requested work update [2013-12-28 11:36:51] Stratum from pool 0 requested work update [2013-12-28 11:37:22] Stratum from pool 0 requested work update [2013-12-28 11:37:51] Stratum from pool 0 requested work update [2013-12-28 11:38:22] Stratum from pool 0 requested work update [2013-12-28 11:38:52] Stratum from pool 0 requested work update [2013-12-28 11:39:23] Stratum from pool 0 requested work update [2013-12-28 11:39:52] Stratum from pool 0 requested work update [2013-12-28 11:40:22] Stratum from pool 0 requested work update [2013-12-28 11:40:52] Stratum from pool 0 requested work update [2013-12-28 11:41:22] Stratum from pool 0 requested work update [2013-12-28 11:41:53] Stratum from pool 0 requested work update

(I restarted bfgminer with the "-D --debuglog" options to get a more verbose output. I'll post the results when it stops mining again.)

Thanks

fbgithub commented 10 years ago

I can confirm this behavior, too. See #352 . I am running Raspberry Pi Rev B with the latest Raspian and Twinfury. With this setup, bfgminer stops after 5-10 mins. When I am running it with a Block Erupter, it runs at least hours or days without problems.

luke-jr commented 10 years ago

Raspberry Pi has broken USB. To workaround use USB 1.1: add cmdline.txt and add to the end of the line:

dwc_otg.speed=1
davidmurray commented 10 years ago

That did not work, and it might even have made it worse.

I did add "dwc_otg.speed=1" to the end of /boot/cmdline.txt

fbgithub commented 10 years ago

Add it to the front of the line after the last "dwc_otg" related attribute.

As I just changed to another stronger powered usb hub and I must say, may problems have vanished together with the additional attribute stated above.

davidmurray commented 10 years ago

Okay, here's what it looks like now:

pi@raspberrypi ~ $ cat /boot/cmdline.txt
dwc_otg.lpm_enable=0 dwc_otg.speed=1 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait
pi@raspberrypi ~ $

I am not using a USB hub; should I be using one?

Also, I found this statement on this FAQ page: http://www.raspberrypi.org/phpBB3/viewtopic.php?f=28&t=53832 I'm not sure if I want slow ethernet. :/

luke-jr commented 10 years ago

Raspberry Pi requires a self-powered USB hub with at least 5W per port.

prologic commented 10 years ago

I've had a similar issue with my BFL 5Gh/s Miner. THe logs just showed "Rejected ...". Restarting bfgminer got it working again, but I'd like to see this issue (whatever it is) fixed :)

prologic commented 10 years ago

PS: I'm running this off a standard PC (not a Raspberry Pi).

prologic commented 10 years ago

Also here are the dates/times from mining.bitcoin.cz where minign stopped in bfgminer:

21384   2014-01-06 03:05:31 2:42:21 1435249005  none    none    278862  25.18020923  confirmed
21383   2014-01-06 00:23:10 0:40:27 335598605   none    none    278836  25.01455000  confirmed
rstaph commented 10 years ago

Same issue here with bfgminer, 3 Antminer U1's, and btcguild running on a Mac Pro with 10.9.1:

[2014-01-22 16:50:20] Stratum from pool 0 requested work update [2014-01-22 16:50:25] Stratum from pool 0 detected new block [2014-01-22 16:50:30] Stratum from pool 0 requested work update [2014-01-22 16:50:50] Stratum from pool 0 requested work update [2014-01-22 16:51:20] Stratum from pool 0 requested work update [2014-01-22 16:51:50] Stratum from pool 0 requested work update [2014-01-22 16:52:20] Stratum from pool 0 requested work update [2014-01-22 16:52:50] Stratum from pool 0 requested work update [2014-01-22 16:53:20] Stratum from pool 0 requested work update [2014-01-22 16:53:50] Stratum from pool 0 requested work update [2014-01-22 16:54:20] Stratum from pool 0 requested work update [2014-01-22 16:54:32] Stratum from pool 0 detected new block [2014-01-22 16:54:40] Stratum from pool 0 requested work update [2014-01-22 16:54:50] Stratum from pool 0 requested work update [2014-01-22 16:55:20] Stratum from pool 0 requested work update

It will start doing this after very random intervals from a few hours to a few days. Already tried a different USB hub, different internet connection, different combinations of ASICs/USB ports and number of ASICs, adding/removing/switching pools, trying different pools. With as much debugging as I've done on my own its hard to say it is anything but an oddity in bfgminer...

oh.. and I don't have this issue at all under cgminer. But I'd much rather be running bfgminer.

luke-jr commented 10 years ago

A debug log might hlep

prologic commented 10 years ago

@Luke: I haven't had this happen in a few weeks. It might take a while to produce a debug log.

Tell you what - When and If it happens again; I'll restart bfgminer with debug logging turned on so I can produce some useful output for investigation :)

cheers James

James Mills / prologic

E: prologic@shortcircuit.net.au W: prologic.shortcircuit.net.au

On Thu, Jan 23, 2014 at 9:59 AM, Luke-Jr notifications@github.com wrote:

A debug log might hlep

— Reply to this email directly or view it on GitHubhttps://github.com/luke-jr/bfgminer/issues/360#issuecomment-33083503 .

davidmurray commented 10 years ago

[2014-01-29 19:30:26] BFL 0a: Unexpected error attempting to append 2 jobs (queued<=14): [2014-01-29 19:30:26] BFL 0a: Received unexpected queue result response: [2014-01-29 19:30:26] BFL 0a: Unexpected error attempting to append 2 jobs (queued<=14): [2014-01-29 19:30:26] BFL 0a: Received unexpected queue result response: [2014-01-29 19:30:26] BFL 0a: Unexpected error attempting to append 2 jobs (queued<=14): [2014-01-29 19:30:26] BFL 0a: Received unexpected queue result response: [2014-01-29 19:30:26] BFL 0a: Unexpected error attempting to append 2 jobs (queued<=14): [2014-01-29 19:30:26] BFL 0a: Received unexpected queue result response: [2014-01-29 19:30:26] BFL 0a: Unexpected error attempting to append 2 jobs (queued<=14): [2014-01-29 19:30:26] BFL 0a: Received unexpected queue result response: [2014-01-29 19:30:26] BFL 0a: Unexpected error attempting to append 2 jobs (queued<=14): [2014-01-29 19:30:26] BFL 0a: Received unexpected queue result response: [2014-01-29 19:30:26] BFL 0a: Unexpected error attempting to append 2 jobs (queued<=14): [2014-01-29 19:30:26] BFL 0a: Received unexpected queue result response: [2014-01-29 19:30:27] BFL 0a: Unexpected error attempting to append 2 jobs (queued<=14): [2014-01-29 19:30:27] BFL 0a: Error: Get temp returned empty string/timed out [2014-01-29 19:30:27] BFL 0a: Received unexpected queue result response: [2014-01-29 19:30:27] BFL 0a: Unexpected error attempting to append 2 jobs (queued<=14): [2014-01-29 19:30:27] BFL 0a: Received unexpected queue result response: [2014-01-29 19:30:27] BFL 0a: Unexpected error attempting to append 2 jobs (queued<=14): [2014-01-29 19:30:27] BFL 0a: Received unexpected queue result response: [2014-01-29 19:30:28] BFL 0a: Unexpected error attempting to append 2 jobs (queued<=14):

luke-jr commented 10 years ago

Sounds like the device died. Not handled well at present.

prologic commented 10 years ago

I recall getting similar errors as well (Mine is still running just fine).

Restarting bfgminer seems to fix it right up again.

Perhaps bfgminer could detect when/if the device starts to not respond well and restart in some fasion?

cheers James

James Mills / prologic

E: prologic@shortcircuit.net.au W: prologic.shortcircuit.net.au

On Thu, Jan 30, 2014 at 11:50 AM, Luke-Jr notifications@github.com wrote:

Sounds like the device died. Not handled well at present.

— Reply to this email directly or view it on GitHubhttps://github.com/luke-jr/bfgminer/issues/360#issuecomment-33653150 .

luke-jr commented 10 years ago

It tries, but not hard enough. Improving unplugging support is on my todo list.

prologic commented 10 years ago

Cool hopefully we'll get this sorted soon then!

James Mills / prologic

E: prologic@shortcircuit.net.au W: prologic.shortcircuit.net.au

On Thu, Jan 30, 2014 at 12:09 PM, Luke-Jr notifications@github.com wrote:

It tries, but not hard enough. Improving unplugging support is on my todo list.

— Reply to this email directly or view it on GitHubhttps://github.com/luke-jr/bfgminer/issues/360#issuecomment-33654131 .

cyberchriss commented 10 years ago

Any progress in improving unplugging support? I am using cgminer right now, because it reanimates my two red furies every time they died. I'd like to prefer bfgminer, because it performs much better (+500mh)

Quix0r commented 10 years ago

Still an issue with latest master/HEAD. I tested to replug all miners (different ports on the hub) + tried to plug the hub into other port on computer + 2 ports on laptop, nothing helped. I have different issues with bfgminer and cgminer (not yours). So I add a report (with --debug -T logfile).

Quix0r commented 10 years ago

Still this bug affects a "regular" computer (not a Raspberry Pi here). I restart my miners for the debug log.

wilson0x4d commented 9 years ago

luke, the "dwc_otg.speed=1" does not solve the problem, have you contacted RPF for assistance in fixing this problem? any ideas where I should start to "hack up" the code and force BFGMINER to play nice(tm) with my hardware?

It appears Gen2 RPIs are using a newer Broadcom USB controller, so when we say "Raspberry Pi has broken USB" what, precisely, are we referring to?

Thanks.

wilson0x4d commented 9 years ago

For others experiencing the same problem, ckolivas/cgminer (https://github.com/ckolivas/cgminer) doesn't appear to have this problem, and doesn't require USB to be 'down-leveled' to v1.1 via cmdline.txt (read: "it just works", at least on latest kernel 3.18+ trunk rpi2.)

I was looking at comparative performance using bcmstat (https://github.com/MilhouseVH/bcmstat), as well.. when using BFGMINER my CPU is maxed out, whereas CGMINER has maintained a steady and low CPU use of 15% or lower while driving 4 Monarchs (18 hours uptime, no errors.)

Another interesting discovery is that running BCMSTAT after BFGMINER is running full-swing causes BFGMINER to experience these Communication Errors, consistently, every time. CGMINER does not have this problem, ever.

I believe the problem lies somewhere within BFGMINER, and I believe the problem is only observed when CPU use is 100% maxed out combined with a burst of hardware interrupts (like reading/writing to the SD card, while also reading/writing eth0, while also driving 2-4Thps of BFL miners.

Put another way, I do not believe this is truly a RPI-specific issue, but I absolutely believe it can be consistently reproduced on a RPI due to its low CPU and tight IRQ couplings between misc hardware. I suspect users of other platforms with similar hardware implementation will exhibit the same erroneous behavior.

Granted, configuring a RPI from scratch for CGMINER is more involved (security setup on TTY devices mainly), but it's stable, it uses less CPU/power, and it is averaging 20-30Ghps more hashpower per Thps/miner than I was seeing with BFGMINER.

If that's not incentive enough to review the issue with a fresh perspective on the problem I don't know what to say. I really like BFGMINER and all of luke's contributions to the bitcoin community, but have to admit that I like stable software more.