mimblewimble / grin-miner

Standalone miner for grin
Apache License 2.0
296 stars 153 forks source link

Segmentation fault on macOS Sierra and High Sierra #2

Closed photis closed 6 years ago

photis commented 6 years ago

After about 15 minutes grin-miner, connected to a remote stratum server over an unstable internet connection, crashes with a segmentation fault. It happens every time. Also tested on another MBP running a clean version of Sierra with same result.

1) macOS High Sierra 10.13.4 (17E202), MacBook Pro (Retina, 15-inch, Mid 2015), 2,5 GHz Intel Core i7, 16 GB 1600 MHz DDR3

2) macOS Sierra 10.12.6 (16G1314), MacBook Pro (Retina, 15-inch, Early 2013), 2.4 GHz Intel Core i7, 8 GB 1600 MHz DDR3

grin-miner_2018-05-03-190344_hermes_crash.txt

photis commented 6 years ago

As a follow-up: I tried on 2 different MacBook Pro laptops, they both display the same behaviour for mean_cpu and mean_compat_cpu. Only with lean_cpu does the miner not crash. All tests done with NUM_THREADS = 1.

The lean_cpu miners have been running overnight for about 9-10 hours now, with not a single block solved, according to the stratum server they're connected to (remotely over the internet). They sometimes disconnect (broken pipe) but each time successfully reconnect (with an incremented worker ID)

[EDIT] This appears to be another issue, see there: https://github.com/mimblewimble/grin-miner/issues/5

photis commented 6 years ago

Second follow-up: I tested with a locally run grin stratum server on the 2nd MBP, and a local miner on the same MBP. Again, all tests done with NUM_THREADS = 1. In this setup, it is also only the lean_cpu miner that runs without problems. Both mean_cpu and mean_compat_cpu return a segmentation fault after 15-20 minutes running, same as with a remote stratum server, as described above. That means we can rule out that a flaky communication channel is to blame.

photis commented 6 years ago

Third follow-up: After running for 24 hours, grin-miner with lean_cpu connected to a local stratum server also terminated with a Segmentation Fault.

macOS Sierra 10.12.6 (16G1314), MacBook Pro (Retina, 15-inch, Early 2013), 2.4 GHz Intel Core i7, 8 GB 1600 MHz DDR3

andybellenie commented 6 years ago

I'm seeing the same issue on 2012 MBP with High Sierra.

garyyu commented 6 years ago

also see Segmentation fault on MBP / mac pro (High Sierra 10.13.4).

garyyu commented 6 years ago

It seems problem is in cuckooplugin. I got the following backtrace when Segmentation fault:

Process 36024 stopped
* thread #6, stop reason = EXC_BAD_ACCESS (code=1, address=0x70000c4f2ea0)
    frame #0: 0x00007fff5ed45f6d libsystem_platform.dylib`_platform_memmove$VARIANT$Haswell + 77
libsystem_platform.dylib`_platform_memmove$VARIANT$Haswell:
->  0x7fff5ed45f6d <+77>: vmovups -0x10(%rsi,%rdx), %xmm1
    0x7fff5ed45f73 <+83>: subq   $0x20, %rdx
    0x7fff5ed45f77 <+87>: jbe    0x7fff5ed45fa3            ; <+131>
    0x7fff5ed45f79 <+89>: vmovups (%rsi), %ymm0
Target 0: (grin-miner) stopped.

(lldb) thread backtrace
* thread #6, stop reason = EXC_BAD_ACCESS (code=1, address=0x70000c4f2ea0)
  * frame #0: 0x00007fff5ed45f6d libsystem_platform.dylib`_platform_memmove$VARIANT$Haswell + 77
    frame #1: 0x00000001025a07c8 mean_cpu_30.cuckooplugin`blake2b_update + 328
    frame #2: 0x00000001025a3485 mean_cpu_30.cuckooplugin`blake2b + 453
    frame #3: 0x0000000102596f89 mean_cpu_30.cuckooplugin`cuckoo_call + 233
    frame #4: 0x0000000102596db1 mean_cpu_30.cuckooplugin`process_internal_worker(void*) + 49
    frame #5: 0x00007fff5ed4c661 libsystem_pthread.dylib`_pthread_body + 340
    frame #6: 0x00007fff5ed4c50d libsystem_pthread.dylib`_pthread_start + 377
    frame #7: 0x00007fff5ed4bbf9 libsystem_pthread.dylib`thread_start + 13
photis commented 6 years ago

Look under “User Reports” (in left column) in standard macOS Console.app

@photis https://github.com/photis, how did you get your backtrace? I run it by $ RUST_BACKTRACE=1 ./target/debug/grin-miner but when Segmentation fault: 11 happen, I can't see the backtrace.

garyyu commented 6 years ago

@photis @andybellenie Could you please take the latest version and test it again? to confirm this fix also works for your MBP. Thanks~

photis commented 6 years ago

@garyyu On it now, took some time because I'm on the move again and had some trouble convincing the owner of the router I'm connecting to to let me forward the necessary ports. Will keep you posted.

1st update: _mean_compatcpu with NUM_THREADS=1 has been running for almost 12 hours, locally on same MBP as stratum server, without problems. Will stop that shortly and try with mean_cpu

2nd update: that was quick: grin-miner with _meancpu practically instantaneously aborts without so much as a meaningful message in the log file. FULL log file hereafter:

May 25 10:48:10.616 INFO This is Grin-Miner version 0.2.0 (git 707ed7d), built for x86_64-apple-darwin by rustc 1.26.0 (a77568041 2018-05-07).
May 25 10:48:10.617 DEBG Built with profile "release", features "" on Thu, 24 May 2018 05:37:58 GMT.
May 25 10:48:11.621 WARN Connection Status: Connected to Grin server at 127.0.0.1:13416.
May 25 10:48:11.725 DEBG sending request: {"id":"0","jsonrpc":"2.0","method":"getjobtemplate","params":null}
May 25 10:48:40.086 DEBG Received message: {"id":"Stratum","jsonrpc":"2.0","method":"job","params":"{\"height\":84989,\"difficulty\":1,\"pre_pow\":\"00010000000000014bfdb4d4eed5159ccbca85fe7775771e507f92b4fefedea75fff44970348dc22e83b000000005b07bfbb000000000001cae5affb2ed2b32deac144f4804452c9afa8a77b75ef17d48c1641d7e9b46476cd90e7efb7b9113271c5148dddcaa34d41e71fa71deb4eedbc22e7e2700a7eac5b1de466a275e60d64b347175b1c38a298d292469322848a5237757255ff06483dd08796a603489c65b571ccf191b87298b213303c2eb7a1cab24973f3d1572b3671\"}"}

May 25 10:48:40.086 DEBG Received request type: job
May 25 10:48:40.086 INFO Got a new job: JobTemplate { height: 84989, difficulty: 1, pre_pow: "00010000000000014bfdb4d4eed5159ccbca85fe7775771e507f92b4fefedea75fff44970348dc22e83b000000005b07bfbb000000000001cae5affb2ed2b32deac144f4804452c9afa8a77b75ef17d48c1641d7e9b46476cd90e7efb7b9113271c5148dddcaa34d41e71fa71deb4eedbc22e7e2700a7eac5b1de466a275e60d64b347175b1c38a298d292469322848a5237757255ff06483dd08796a603489c65b571ccf191b87298b213303c2eb7a1cab24973f3d1572b3671" }
May 25 10:48:40.086 DEBG Received message: {"id":"3","jsonrpc":"2.0","method":"getjobtemplate","result":"{\"height\":84989,\"difficulty\":1,\"pre_pow\":\"00010000000000014bfdb4d4eed5159ccbca85fe7775771e507f92b4fefedea75fff44970348dc22e83b000000005b07bfbb000000000001cae5affb2ed2b32deac144f4804452c9afa8a77b75ef17d48c1641d7e9b46476cd90e7efb7b9113271c5148dddcaa34d41e71fa71deb4eedbc22e7e2700a7eac5b1de466a275e60d64b347175b1c38a298d292469322848a5237757255ff06483dd08796a603489c65b571ccf191b87298b213303c2eb7a1cab24973f3d1572b3671\"}","error":null}

May 25 10:48:40.086 DEBG Received response with id: 3
May 25 10:48:40.086 INFO Got a job at height 84989 and difficulty 1
May 25 10:48:40.128 DEBG Miner received message: ReceivedJob(84989, 1, "00010000000000014bfdb4d4eed5159ccbca85fe7775771e507f92b4fefedea75fff44970348dc22e83b000000005b07bfbb000000000001cae5affb2ed2b32deac144f4804452c9afa8a77b75ef17d48c1641d7e9b46476cd90e7efb7b9113271c5148dddcaa34d41e71fa71deb4eedbc22e7e2700a7eac5b1de466a275e60d64b347175b1c38a298d292469322848a5237757255ff06483dd08796a603489c65b571ccf191b87298b213303c2eb7a1cab24973f3d1572b3671")
May 25 10:48:40.128 DEBG Mining Cuckoo30 for height: 84989
May 25 10:48:40.148 INFO Cuckoo plugin 0 - /Users/photis/Projects/mimble/grin-miner/target/release/plugins/mean_cpu_30.cuckooplugin
May 25 10:48:40.148 DEBG Cuckoo Plugin 0: Setting mining parameter NUM_THREADS to 1 on Device 0
May 25 10:48:40.148 DEBG Miner received message: ReceivedJob(84989, 1, "00010000000000014bfdb4d4eed5159ccbca85fe7775771e507f92b4fefedea75fff44970348dc22e83b000000005b07bfbb000000000001cae5affb2ed2b32deac144f4804452c9afa8a77b75ef17d48c1641d7e9b46476cd90e7efb7b9113271c5148dddcaa34d41e71fa71deb4eedbc22e7e2700a7eac5b1de466a275e60d64b347175b1c38a298d292469322848a5237757255ff06483dd08796a603489c65b571ccf191b87298b213303c2eb7a1cab24973f3d1572b3671")
May 25 10:48:40.151 DEBG Mining Cuckoo30 for height: 84989
May 25 10:48:40.151 DEBG Not re-loading plugin or directory.
May 25 10:48:41.014 DEBG sending request: {"id":"0","jsonrpc":"2.0","method":"status","params":null}

I've include the crash report from macOS Console.app grin-miner_2018-05-25-104842_janus crash.txt

andybellenie commented 6 years ago

Seems fixed, thanks!

garyyu commented 6 years ago

@andybellenie thanks for the feedback.

garyyu commented 6 years ago

@photis your log looks like a build problem. Could you please try to build on the running MBP? use cargo clean before cargo build --release.

photis commented 6 years ago

Both executables were build on the running MBP. I hadn't done a cargo clean prior to the cargo build --release, will do that now and let you know the result.

photis commented 6 years ago

Sorry it took so long, there were some incidents that required my attention elsewhere. I did as you instructed and running grin-miner with mean_cpu still results in an illegal instruction error. The log file doesn't report on this, it's the TUI that shows that message. Log file below

May 28 22:12:24.748 INFO This is Grin-Miner version 0.2.0 (git 707ed7d), built for x86_64-apple-darwin by rustc 1.26.0 (a77568041 2018-05-07).
May 28 22:12:24.749 DEBG Built with profile "release", features "" on Mon, 28 May 2018 17:27:25 GMT.
May 28 22:12:25.753 WARN Connection Status: Connected to Grin server at 127.0.0.1:13416.
May 28 22:12:25.858 DEBG sending request: {"id":"0","jsonrpc":"2.0","method":"getjobtemplate","params":null}
May 28 22:12:26.064 DEBG Received message: {"id":"1","jsonrpc":"2.0","method":"getjobtemplate","result":"{\"height\":89081,\"difficulty\":1,\"pre_pow\":\"00010000000000015bf9b3d0c25d217ca4a03284cc63e0c53440113912ef13f17390afe52ab45e50f4f5000000005b0c547e000000000001dfa7a20c11610e0223c225102d768bc75f7ac822c4b62525701ef4f09f74d7d08cb7c7a31b8b85ed19042f24a7271c74ce87592d90d9a7eb9efea18b05283ec5ed2c8e037f605a02eb2885a041e343cf31498d3820b570865958ff87c790df3e6537b15a750103ca95180bb23750215b28707a55f339dd29b5d63155f9f6def2fec2\"}","error":null}

May 28 22:12:26.064 DEBG Received response with id: 1
May 28 22:12:26.064 INFO Got a job at height 89081 and difficulty 1
May 28 22:12:26.097 DEBG Miner received message: ReceivedJob(89081, 1, "00010000000000015bf9b3d0c25d217ca4a03284cc63e0c53440113912ef13f17390afe52ab45e50f4f5000000005b0c547e000000000001dfa7a20c11610e0223c225102d768bc75f7ac822c4b62525701ef4f09f74d7d08cb7c7a31b8b85ed19042f24a7271c74ce87592d90d9a7eb9efea18b05283ec5ed2c8e037f605a02eb2885a041e343cf31498d3820b570865958ff87c790df3e6537b15a750103ca95180bb23750215b28707a55f339dd29b5d63155f9f6def2fec2")
May 28 22:12:26.097 DEBG Mining Cuckoo30 for height: 89081
May 28 22:12:26.117 INFO Cuckoo plugin 0 - /Users/photis/Projects/mimble/grin-miner/target/release/plugins/mean_cpu_30.cuckooplugin
May 28 22:12:26.117 DEBG Cuckoo Plugin 0: Setting mining parameter NUM_THREADS to 1 on Device 0
May 28 22:12:27.049 DEBG Mining: Plugin 0 - Device 0 (CPU) Status: OK : Last Graph time: 0s; Graphs per second: inf - Total Attempts: 0
May 28 22:12:27.049 INFO Mining: Cuckoo30 at 0 gps (graphs per second)

...and it stops just there. No other lines are output.

The log from macOS Console.app is here: grin-miner_2018-05-28-221228_janus.crash.txt

garyyu commented 6 years ago

Could you please try mean_compat_cpu? Thanks.

garyyu commented 6 years ago

Oh, I just go back to history and saw you had tested mean_compat_cpu with NUM_THREADS=1 for almost 12 hours. That means this problem has already been fixed, you can set NUM_THREADS=4 or 8, as many as your CPU cores number.
mean_cpu can only be used for some new CPU, your MacBook Pro (Retina, 15-inch, Early 2013) can't use this option. Thanks again for your test. You can close this issue if no other questions.

photis commented 6 years ago

Alright, glad it helped and thanks for fixing it.