Closed HypeLaser closed 1 month ago
which hardware version is this? what power supply are you using?
1x Max, Board 204, Model BM1366 2x Supra, Board 401, Model BM1368
Using the power supplies that shipped with the devices.
It took 38 minutes for the error to show again on one of the Supra's, and now I can also see the hash rate on the Max is wildly over...
there are many different power supplies depending on when and where you got your Bitaxe... can you give me some more details? I think this may be a power issue.
maybe you could also post a screenshot of the dashboard from all your devices when this problem is happening.
Power supply failures on three devices on the same day seems unlikely, but worth investigating for sure. All devices were bought from Bitcoin Merch, the Ultra (not Max!) in Feb 2024 and the Supra's in March 2024.
(fwiw 204 is an ultra). How did you update the firmware on all these Bitaxe?
Thank you for letting me know it is an Ultra. It is confusing keeping across what the devices are.
I manually clicked the GitHub links on the devices internal page, on the Settings page. I updated the firmware and the website from the files that downloaded from GitHub.
ok, you said only the public-pool.io pointed Bitaxe were giving you trouble. maybe this is related to a bug in handling pool outages (of which there were a couple today). I'll set up a 204 and 401 pointed to PP and see if I can reproduce this.
Just an overnight update.
The Dutch.nl Bitaxe is still running with no issues since upgrading to 2.1.9, and the three Public Pool have also run overnight since the reset with no issues.
I'm now suspecting the PP outage is what caused this, as you mentioned.
Out of curiosity, when PP came back online, should the devices have reconnected by themselves?
seeing the aa 55 in the middle of the frame, look like there was a desync of the framing.
I'll check on the last commits maybe some commit changes something unintentionally
Two of the Public Pool devices have stopped hashing, same as before.
attached are the Dashboards, as requested. Same error "Serial RX invalid 11". These are both Supra's, board 401.
Also. Clicking the RESTART button doesn't restart. The device goes offline and I cannot access it again without turning the power off and on again.
Same issue here with the device in Public Pool. After some hours is showing 5234GH/s and not working. Restarted and worked again. Happened twice. Just bought it 1 day ago and updated to 2.1.9
Same issue here with the device in Public Pool. After some hours is showing 5234GH/s and not working. Restarted and worked again. Happened twice. Just bought it 1 day ago and updated to 2.1.9
Same happening to me on my 6 devices 2.1.9
this just happened to me! BitaxeUltra 202 running v2.1.9 firmware. I have a suspicion that it happened with a public-pool outage overnight, but it's hard to know for sure. 2 other Bitaxes still running fine.
this is the only indication from the dashboard, all other stats look good. Log shows tons of Serial RX invalid 11
₿ (149651229) bm1366Module: 9e aa 55 a8 00 51 94 02 55 05 dd
₿ (149651719) bm1366Module: Serial RX invalid 11
₿ (149651729) bm1366Module: 91 aa 55 08 00 68 4c 00 53 29 23
₿ (149653189) bm1366Module: Serial RX invalid 11
₿ (149653189) bm1366Module: 90 aa 55 70 02 0d 82 01 59 03 21
₿ (149653249) stratum_task: rx: {"id":null,"method":"mining.notify","params":["da1e63","4d9862332c27aaf3293856481f2ac7f71d6762c50001848e0000000000000000","02000000010000000000000000000000000000000000000000000000000000000000000000ffffffff1703ea0e0d5075626c69632d506f6f6c","ffffffff021d7df51200000000160014c64b1b9283ba1ea86bb9e7b696b0c8f68dad04000000000000000000266a24aa21a9ed4ca029d391eb165ea7dc0a0d4280ac260fd2ce2f86d678ff70640eeadeae0fed00000000",["f1cb32c85599d2c5a793b6ad6b11497f12c242e9055e3a312ebf1b62142d4e3a","e7d63762b78730203129046f64e1be4e4c077c67ad635fd7320306cdbfef9c23","1f7615138e4cc031c3d3139445a8f212b7e5c1c0c6d20d322b03fca02a404b63","cad7df613660d2b5c56650e1403e91d0fd96986fd7582b86bb7ed081b95ae7d9","3113d0a63c3ea791dcecbf6b46740341149f420aa1fc4fd4be49f8a21c8f026d","91167e0fb95a1638aa931b3de009a796417f835bc20899d792af0dca40348f99","109e75786424c730d25a3c58b1e431d0b7690dac7a55d2d5fd253a82f0d04be3","87a92f67ac5d54d85ffe7736e79812c7f761d7bbbde94907696bb2041fe7ef19","46fbd352426e8ff685452d9e7b1cb73e3ca9237d26b00727ff233305d714b27e","7b316f1cd2190e56cc885565ac56b76f3a25a283c8e45f6d378a9af8a2c3c857","b1caa3283a158982b4dd91e566ade8c6f36202a1e68ffdc28c25880d19898d14","afb0a547b132fdddc420539161ef884d941533032e7ca2f57bb5b4c20e97a99f"],"20000000","17031abe","66b39a61",false]}
₿ (149655149) create_jobs_task: New Work Dequeued da1e63
₿ (149656409) bm1366Module: Serial RX invalid 11
₿ (149656409) bm1366Module: 85 aa 55 c8 00 ef ce 00 66 5a 16
₿ (149660479) bm1366Module: Serial RX invalid 11
₿ (149660489) bm1366Module: 94 aa 55 74 00 9a ea 01 77 5f 77
₿ (149662859) bm1366Module: Serial RX invalid 11
₿ (149662859) bm1366Module: 96 aa 55 40 02 9f 4e 01 7d 7a 75
₿ (149663089) bm1366Module: Serial RX invalid 11
₿ (149663089) bm1366Module: 93 aa 55 32 01 d2 ae 02 7d 8a 75
₿ (149666159) bm1366Module: Serial RX invalid 11
₿ (149666159) bm1366Module: 81 aa 55 bc 03 2d 2d 02 0d 48 2d
₿ (149668049) bm1366Module: Serial RX invalid 11
₿ (149668049) bm1366Module: 97 aa 55 ca 02 6d 22 01 10 40 28
Got it! I blocked all network traffic to my bitaxe 401.. it kept hashing on generated work in the queue for a while, until the ASIC just stopped sending nonces. I re-enabled network traffic to the Bitaxe, and it started mining for a while, but then got borked. Here is right where it stopped working (raw rx bytes shown);
rx: [AA 55 52 00 30 25 00 91 00 B1 93]
I (62437861) bm1368Module: Job ID: 48, Core: 41/1, Ver: 00162000
I (62437861) asic_result: Ver: 20162000 Nonce 25300052 diff 0.0 of 1000.
rx: [AA 55 7A 01 76 B4 00 8B 0B DB 9A]
I (62437871) bm1368Module: Job ID: 40, Core: 61/11, Ver: 017B6000
I (62437881) asic_result: Ver: 217B6000 Nonce B476017A diff 0.0 of 1000.
rx: [53 04 5E 1B 2E 8F AA 55 6C 00 75]
I (62437891) bm1368Module: Serial RX invalid 11
I (62437901) bm1368Module: 53 04 5e 1b 2e 8f aa 55 6c 00 75
rx: [CA 00 46 0C D6 86 AA 55 92 01 D2]
I (62437911) bm1368Module: Serial RX invalid 11
I (62437911) bm1368Module: ca 00 46 0c d6 86 aa 55 92 01 d2
rx: [66 02 0C 28 20 9D AA 55 7A 02 EE]
I (62437921) bm1368Module: Serial RX invalid 11
I (62437931) bm1368Module: 66 02 0c 28 20 9d aa 55 7a 02 ee
now the hashrate has gone wild;
So this might be cause by the dns lookup and missing handling 🤔
this is the only indication from the dashboard, all other stats look good. Log shows tons of
Serial RX invalid 11
₿ (149651229) bm1366Module: 9e aa 55 a8 00 51 94 02 55 05 dd ₿ (149651719) bm1366Module: Serial RX invalid 11 ₿ (149651729) bm1366Module: 91 aa 55 08 00 68 4c 00 53 29 23 ₿ (149653189) bm1366Module: Serial RX invalid 11 ₿ (149653189) bm1366Module: 90 aa 55 70 02 0d 82 01 59 03 21 ₿ (149653249) stratum_task: rx: {"id":null,"method":"mining.notify","params":["da1e63","4d9862332c27aaf3293856481f2ac7f71d6762c50001848e0000000000000000","02000000010000000000000000000000000000000000000000000000000000000000000000ffffffff1703ea0e0d5075626c69632d506f6f6c","ffffffff021d7df51200000000160014c64b1b9283ba1ea86bb9e7b696b0c8f68dad04000000000000000000266a24aa21a9ed4ca029d391eb165ea7dc0a0d4280ac260fd2ce2f86d678ff70640eeadeae0fed00000000",["f1cb32c85599d2c5a793b6ad6b11497f12c242e9055e3a312ebf1b62142d4e3a","e7d63762b78730203129046f64e1be4e4c077c67ad635fd7320306cdbfef9c23","1f7615138e4cc031c3d3139445a8f212b7e5c1c0c6d20d322b03fca02a404b63","cad7df613660d2b5c56650e1403e91d0fd96986fd7582b86bb7ed081b95ae7d9","3113d0a63c3ea791dcecbf6b46740341149f420aa1fc4fd4be49f8a21c8f026d","91167e0fb95a1638aa931b3de009a796417f835bc20899d792af0dca40348f99","109e75786424c730d25a3c58b1e431d0b7690dac7a55d2d5fd253a82f0d04be3","87a92f67ac5d54d85ffe7736e79812c7f761d7bbbde94907696bb2041fe7ef19","46fbd352426e8ff685452d9e7b1cb73e3ca9237d26b00727ff233305d714b27e","7b316f1cd2190e56cc885565ac56b76f3a25a283c8e45f6d378a9af8a2c3c857","b1caa3283a158982b4dd91e566ade8c6f36202a1e68ffdc28c25880d19898d14","afb0a547b132fdddc420539161ef884d941533032e7ca2f57bb5b4c20e97a99f"],"20000000","17031abe","66b39a61",false]} ₿ (149655149) create_jobs_task: New Work Dequeued da1e63 ₿ (149656409) bm1366Module: Serial RX invalid 11 ₿ (149656409) bm1366Module: 85 aa 55 c8 00 ef ce 00 66 5a 16 ₿ (149660479) bm1366Module: Serial RX invalid 11 ₿ (149660489) bm1366Module: 94 aa 55 74 00 9a ea 01 77 5f 77 ₿ (149662859) bm1366Module: Serial RX invalid 11 ₿ (149662859) bm1366Module: 96 aa 55 40 02 9f 4e 01 7d 7a 75 ₿ (149663089) bm1366Module: Serial RX invalid 11 ₿ (149663089) bm1366Module: 93 aa 55 32 01 d2 ae 02 7d 8a 75 ₿ (149666159) bm1366Module: Serial RX invalid 11 ₿ (149666159) bm1366Module: 81 aa 55 bc 03 2d 2d 02 0d 48 2d ₿ (149668049) bm1366Module: Serial RX invalid 11 ₿ (149668049) bm1366Module: 97 aa 55 ca 02 6d 22 01 10 40 28
here the aa 55 that should be at the begining of the frame is sifted by 1
Got it! I blocked all network traffic to my bitaxe 401.. it kept hashing on generated work in the queue for a while, until the ASIC just stopped sending nonces. I re-enabled network traffic to the Bitaxe, and it started mining for a while, but then got borked. Here is right where it stopped working (raw rx bytes shown);
rx: [AA 55 52 00 30 25 00 91 00 B1 93] I (62437861) bm1368Module: Job ID: 48, Core: 41/1, Ver: 00162000 I (62437861) asic_result: Ver: 20162000 Nonce 25300052 diff 0.0 of 1000. rx: [AA 55 7A 01 76 B4 00 8B 0B DB 9A] I (62437871) bm1368Module: Job ID: 40, Core: 61/11, Ver: 017B6000 I (62437881) asic_result: Ver: 217B6000 Nonce B476017A diff 0.0 of 1000. rx: [53 04 5E 1B 2E 8F AA 55 6C 00 75] I (62437891) bm1368Module: Serial RX invalid 11 I (62437901) bm1368Module: 53 04 5e 1b 2e 8f aa 55 6c 00 75 rx: [CA 00 46 0C D6 86 AA 55 92 01 D2] I (62437911) bm1368Module: Serial RX invalid 11 I (62437911) bm1368Module: ca 00 46 0c d6 86 aa 55 92 01 d2 rx: [66 02 0C 28 20 9D AA 55 7A 02 EE] I (62437921) bm1368Module: Serial RX invalid 11 I (62437931) bm1368Module: 66 02 0c 28 20 9d aa 55 7a 02 ee
now the hashrate has gone wild;
and here shifted by 6
from a random Saleae Captre of a BM1368 (thanks for the yesterday donator!) I can see this kind of Nonce sent by the chip
For whatever reason chip randomly sent a frame with 1 extra byte (all other 10k+ nonce frame have the good lenght)
So if this happen, current ESP-Miner which is framing the RX with a fixed size of frame, will never resync to the aa 55.
I made a change to the serial parser in BM1366.c and BM1368.c so that it flushes the buffer after any invalid serial RX (ie doesn't start with AA 55). From my testing so far it seems to be working. https://github.com/skot/ESP-Miner/tree/serialrx11_fix
I also have been keeping an eye on the size of the serial buffer. it seems like at some point esp-miner stops emptying the ESP32 serial RX buffer.. need to figure out why that happens.
I added a fix for this and some other memory leaks in https://github.com/skot/ESP-Miner/tree/219-leak_hunting
give it a try and see how it holds up!
I added a fix for this and some other memory leaks in https://github.com/skot/ESP-Miner/tree/219-leak_hunting
give it a try and see how it holds up!
Thank you for your efforts. I have updated the firmware to your version above, and will keep an eye and see what happens.
Update: So far run for 24hrs with no reboots and no Serial RX errors.
Further update: The three devices connected to Public Pool are still running, however the one connected to Dutch.nl has got stuck and I had to reset it . The logs only show "http_server: Handshake done, the new connection was opened".
Also, as a side note, the three Public Pool devices have been running solidly for over two days. But I've noticed they're not hitting difficulties any higher than 44 million. Nonce issue?
After updating to 2.1.9, devices seemed to settle after an hour, running at or above their target hash.
However, at 8hrs in I noticed three of my four devices had dramatically dropped hash rates (80GHz or so) and the following error messages:
₿ (37720852) bm1366Module: Serial RX invalid 11 ₿ (37720852) bm1366Module: 35 3e 3d 8f aa 55 8a 01 f3 6f 02 ₿ (37722862) bm1366Module: Serial RX invalid 11 ₿ (37722862) bm1366Module: 43 69 13 82 aa 55 52 02 ba 55 01
All three of these devices were connected to the Public Pool address. The fourth, which is still running, is connected to Dutch.nl and seems fine.
Rebooting the devices has stopped the error, and the device operate as normal. I await the 8hr mark to see if the errors appear again, or if they appear on my fourth and last device.