Open mutatrum opened 4 months ago
I'm adding support for client.reconnect
and see if resetting send_uid
on a socket restart fixes this.
In the file component/stratum/stratum_api.c file , it does not look like the “STRATUM_V1_receive_jsonrpc_line” and “STRATUM_V1_submit_share” functions will properly detect an orderly socket close by the server. The recv call would return ‘0’ in this case and the check against -1 would miss the event, and loop forever in the read. The writes don’t check for ECONNRESET either, so they will also fail to recognize the error.
Depending on the compiler optimization and ESP32 behavior , it may also be necessary to declare GLOBAL_STATE->sock as volatile to ensure that it is pulled from the heap and not cached register values.
According to lwIP recv() docs, the API is not like POSIX, and rather than return -1, it returns the actual errcode.
After replacing an Half-Defective-Chinese-Fan (The Fan-Sound has Changed and the Airflow was massively reduced) with an Noctua-Fan (NF-A4x20) the Error:
stratum_api: Error: recv
stratum_api: Restarting System because of Error: recv
is gone and the Bitaxe is running like a charm again (many hours not just minutes).
I assume some interference is causing the bitaxe to restart. What do you think is it possible that the Bitaxe restarts caused by electric or/and magnetic interference?
"version": "v2.1.8", "boardVersion": "204",
"temp": 48 (Half-Defective-Chinese-Fan)
"temp": 58 (Noctua-Fan)
@mutatrum is this still happening? We've had a lot of changes around handling network failures
The timeout still happens, but the device recovered by shutting down the socket and reconnecting to the pool after the wifi is back up and running. On my my Supra 400, both in 2.1.10 as well in 2.2.2 were good, both versions had 30+ days uptime with probably a few bcn_timeout
event per day.
Logs:
I (67497825) bm1368Module: Job ID: 48, Core: 1/9, Ver: 06E92000
I (67497825) asic_result: Ver: 26E92000 Nonce 0EAD0002 diff 331.7 of 639.
I (67498785) wifi:bcn_timeout,ap_probe_send_start
I (67499185) bm1368Module: Job ID: 10, Core: 29/2, Ver: 047C4000
I (67499185) asic_result: Ver: 247C4000 Nonce 6FB0023A diff 494.8 of 639.
I (67499825) bm1368Module: Job ID: 28, Core: 32/14, Ver: 06FE4000
I (67499825) asic_result: Ver: 26FE4000 Nonce 0DE90040 diff 421.1 of 639.
I (67501285) wifi:ap_probe_send over, resett wifi status to disassoc
I (67501285) wifi:state: run -> init (0xc800)
I (67501295) wifi:pm stop, total sleep time: 1897374459 us / 2147338912 us
I (67501295) wifi:<ba-del>idx:1, tid:0
I (67501295) wifi:<ba-del>idx:0, tid:6
I (67501305) wifi:new:<5,0>, old:<5,0>, ap:<255,255>, sta:<5,0>, prof:1, snd_ch_cfg:0x0
I (67503015) bm1368Module: Job ID: 50, Core: 37/4, Ver: 01908000
I (67503015) asic_result: Ver: 21908000 Nonce B5AD014A diff 5515.0 of 639.
I (67503015) stratum_api: tx: {"id": 430, "method": "mining.submit", "params": ["bc1qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq.bitaxe", "648516380015fdc3", "910a020000000000", "66f8fc0b", "b5ad014a", "01908000"]}
I (67503815) wifi station: Retrying WiFi connection...
I (67503815) stratum_api: Error: recv
E (67503815) stratum_task: Failed to receive JSON-RPC line, reconnecting...
I (67504085) wifi:new:<2,0>, old:<5,0>, ap:<255,255>, sta:<2,0>, prof:1, snd_ch_cfg:0x0
I (67504085) wifi:state: init -> auth (0xb0)
I (67504105) wifi:state: auth -> assoc (0x0)
I (67504105) wifi:state: assoc -> run (0x10)
I (67504135) wifi:connected with <######>, aid = 35, channel 2, BW20, bssid = ##:##:##:##:##:##
I (67504135) wifi:security: WPA2-PSK, phy: bgn, rssi: -38
I (67504155) wifi:pm start, type: 1
I (67504155) wifi:set rx beacon pti, rx_bcn_pti: 0, bcn_timeout: 25000, mt_pti: 0, mt_time: 10000
I (67504155) wifi:AP's beacon interval = 102400 us, DTIM period = 1
I (67504175) wifi:<ba-add>idx:0 (ifx:0, ##:##:##:##:##:##), tid:6, ssn:1, winSize:64
E (67504815) stratum_task: Shutting down socket and restarting...
I (67504815) stratum_task: Socket created, connecting to 51.81.56.15:3333
E (67504815) stratum_task: Socket unable to connect to solo.ckpool.org:3333 (errno 118)
I (67505155) wifi station: Bitaxe ip:192.168.1.153
I (67505155) esp_netif_handlers: sta ip: 192.168.1.153, mask: 255.255.255.0, gw: 192.168.1.1
I (67505695) bm1368Module: Job ID: 48, Core: 50/1, Ver: 04B42000
I (67505695) asic_result: Ver: 24B42000 Nonce E90B0164 diff 876.2 of 639.
I (67505705) stratum_api: tx: {"id": 431, "method": "mining.submit", "params": ["bc1qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq.bitaxe", "648516380015fdc3", "960a020000000000", "66f8fc0b", "e90b0164", "04b42000"]}
I (67505795) wifi:<ba-add>idx:1 (ifx:0, ##:##:##:##:##:##), tid:0, ssn:1, winSize:64
I (67509825) stratum_task: Socket created, connecting to 51.81.56.15:3333
I (67509995) stratum_api: Resetting stratum uid
I (67509995) stratum_task: Clean Jobs: clearing queue
I (67509995) stratum_api: tx: {"id": 1, "method": "mining.subscribe", "params": ["bitaxe/BM1368/v2.2.2"]}
I (67510005) stratum_api: tx: {"id": 2, "method": "mining.configure", "params": [["version-rolling"], {"version-rolling.mask": "ffffffff"}]}
I (67510025) stratum_api: tx: {"id": 3, "method": "mining.suggest_difficulty", "params": [1000]}
I (67510035) stratum_api: tx: {"id": 4, "method": "mining.authorize", "params": ["bc1qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq.bitaxe", "x"]}
I (67510195) stratum_task: rx: {"result":[[["mining.notify","7be71efd"]],"fb8c1b77",8],"id":1,"error":null}
I (67510205) stratum_api: extranonce_str: fb8c1b77
I (67510205) stratum_api: extranonce_2_len: 8
I (67510215) stratum_task: rx: {"params":[10000],"id":null,"method":"mining.set_difficulty"}
I (67510215) stratum_task: Set stratum difficulty: 10000
I (67510405) stratum_task: rx: {"result":{"version-rolling":true,"version-rolling.mask":"1fffe000"},"id":2,"error":null}
I (67510405) stratum_api: Set version mask: 1fffe000
I (67510405) stratum_task: Set version mask: 1fffe000
I (67510415) stratum_task: rx: {"params":["648516380015fdc4","12e729fb4b524609d61ac813d2de87f02ecdf84800000eee0000000000000000","01000000010000000000000000000000000000000000000000000000000000000000000000ffffffff35035e2c0d000429fcf86604198e230d0c","0a636b706f6f6c112f736f6c6f2e636b706f6f6c2e6f72672fffffffff033e5c63120000000016001480bded37e3f86a1a546e099b15e2b02855d3843c8e1160000000000016001451ed61d2f6aa260cc72cdf743e4e436a82c010270000000000000000266a24aa21a9eddec62caf232c233d817ab6384423006a3d80617a806bbc5a071530014cc9d57e00000000",["6a39c5891f9472dcb3e672904ddb116f449f200ed218830b14110dae0a439a45","2768792df31799aa80353cff6a2fc3cea0bf4f0be8df72c1c7be6ab6df96cd43","226962dfb377092a0e4cfbbdd3cb104646572e37a072a5a80dabd539c31ad7ad","2e23a0dca06999f7eeeaa596ec5e585af61ce7be2e5d38eb5709c361f9c60832","89e888f493f891d963312399446457f2c38e261836442d8e74b8387c6e92bd22","e9dfa5ea69c64b3193a7ed556bb17d4251c6b5a00f61e1cc9b631d3de571695e","6956ef269bdc64e7dc8c014c4beb6f5f5c1cbd9588d69910c7213f40c55a6b6d","1218611bd9e5780bef4019396b592637a04887e43e7e0538286d8e361034977b","dec9d88b21bbdc4eb24f71c47a804a051f964afc20407019f885b119586f529d","420228b32dc69ff58b9762cf36389e0cab7633d7b5e31d64c8d653d3902c866b","fa0452327f68a1f4cd6991a5ffff2ad01b1a9300d099baee1cc23befc94cc675","9b5dac11f351f0c44d7bce53b4fc02b54e8f576943805f2d7752fbd412c05db0"],"20000000","17032f14","66f8fc29",true],"id":null,"method":"mining.notify"}
I (67510545) create_jobs_task: New Work Dequeued 648516380015fdc4
I (67510545) stratum_task: rx: {"params":[10000],"id":null,"method":"mining.set_difficulty"}
I (67510545) create_jobs_task: Job processed and queued: 648516380015fdc4
I (67510545) ASIC_task: New pool difficulty 10000
I (67510565) stratum_task: rx: {"result":true,"error":null,"id":4}
The only nit-pick I think could be improved is that stratum_api
shouldn't try to send any more mining.submits if the socket is dead.
When the wifi haa a timeout
wifi:bcn_timeout,ap_probe_send_start
, it tries to restart the socket and stratum connection, but some bits are left in a wrong state resulting in the device hashing but almost no shares accepted. There are several issues hiding that prevent a proper reconnection.It starts with:
Some things that caught my eye:
mining.notify
andversion-rolling
responses are unhandled.mining.submit
messages, even though the socket is not alive.rx
in this log seems like a continuation of the ids from before. It looks like this is relevant, although I'm not sure who is driving this sequence, is that the pool or the miner, or both?It looks like it reconnected correctly, and starts mining with the ckpool default 10000 difficulty. A while later, when a share has been found it's rejected with
Above target
:This looks like #212 and it might be a red herring. However, on the next
mining.submit
tx:The pool requests a
client.reconnect
, as I assume they see something out of sync as well. Not sure what, maybe because of the still increasing ids?And finally, a little while later it seems the parser is in a proper error state and requests a socker restart:
This dance will continue forever, restarting the socket every few minutes.
BitAxe 400, latest master 04c8b80.