radifier / radiator

ccminer fork for Radiant SHA-512/256 and Novo SHA-256t
GNU General Public License v3.0
14 stars 9 forks source link

corrupted double-linked list #2

Closed ayyo2765 closed 2 years ago

ayyo2765 commented 2 years ago

5.10.0-hiveos #83 Nvidia driver 510.85.02

Motherboard: M5A78L-M LX PLUS ASUSTeK Computer INC. (1701 09/11/2014) CPU: 8 × AMD FX(tm)-8150 Eight-Core Processor AES GPU: RTX 3070 Ti × 1 RTX 3090 × 2 RTX 3080 × 1

Starting the miner with 4 cards results in crashes with the corrupted double-linked list error. If i disable (by using the -d option) any one of the cards and only use the other 3, the miner will work correctly.

Snippet from miner log:


Based on pooler cpuminer 2.3.2 and the tpruvot@github and KlausT@github forks
CUDA support by Christian Buchner, Christian H. and DJM34
Includes optimizations and additions implemented by sp-hash, tpruvot, tsiv and others.

Compiled with GCC 7.5 using the Nvidia CUDA Toolkit 11.6

[2022-09-13 01:09:05] Starting Stratum on stratum+tcp://us-east.deepfields.io:7086
[2022-09-13 01:09:05] NVML GPU monitoring enabled.
[2022-09-13 01:09:05] 4 miner threads started, using 'rad' algorithm.
[2022-09-13 01:09:05] Stratum difficulty set to 2
corrupted double-linked list
Aborted (core dumped)

I also ran the miner through valgrind and this is the output.

==20782== Memcheck, a memory error detector
==20782== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==20782== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==20782== Command: ccminer -a rad -o stratum+tcp://us-east.deepfields.io:7086 -u <REDACTED>.Hive1
==20782==
ccminer 0.1.0-Radiator (64bit) for nVidia GPUs

Based on pooler cpuminer 2.3.2 and the tpruvot@github and KlausT@github forks
CUDA support by Christian Buchner, Christian H. and DJM34
Includes optimizations and additions implemented by sp-hash, tpruvot, tsiv and others.

Compiled with GCC 7.5 using the Nvidia CUDA Toolkit 11.6

==20782== Conditional jump or move depends on uninitialised value(s)
==20782==    at 0xC5ACAA0: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02)
==20782==    by 0xC7FBB2E: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02)
==20782==    by 0xC80181E: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02)
==20782==    by 0xC61B70F: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02)
==20782==    by 0x550FD6E: ??? (in /hive/lib/libcudart.so.11.2.152)
==20782==    by 0x5511E70: ??? (in /hive/lib/libcudart.so.11.2.152)
==20782==    by 0x52DD906: __pthread_once_slow (pthread_once.c:116)
==20782==    by 0x5552AF8: ??? (in /hive/lib/libcudart.so.11.2.152)
==20782==    by 0x550C58F: ??? (in /hive/lib/libcudart.so.11.2.152)
==20782==    by 0x55158BF: ??? (in /hive/lib/libcudart.so.11.2.152)
==20782==    by 0x553BD87: cudaDriverGetVersion (in /hive/lib/libcudart.so.11.2.152)
==20782==    by 0x11BCBB: cuda_num_devices() (in /hive/miners/custom/radiator/ccminer)
==20782==  Uninitialised value was created by a heap allocation
==20782==    at 0x4C31A3F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20782==    by 0x4C33D84: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20782==    by 0xC7F3B1D: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02)
==20782==    by 0xC5ACA03: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02)
==20782==    by 0xC7FBB2E: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02)
==20782==    by 0xC80181E: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02)
==20782==    by 0xC61B70F: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02)
==20782==    by 0x550FD6E: ??? (in /hive/lib/libcudart.so.11.2.152)
==20782==    by 0x5511E70: ??? (in /hive/lib/libcudart.so.11.2.152)
==20782==    by 0x52DD906: __pthread_once_slow (pthread_once.c:116)
==20782==    by 0x5552AF8: ??? (in /hive/lib/libcudart.so.11.2.152)
==20782==    by 0x550C58F: ??? (in /hive/lib/libcudart.so.11.2.152)
==20782==
==20782== Conditional jump or move depends on uninitialised value(s)
==20782==    at 0xC5ACAD5: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02)
==20782==    by 0xC7FBB2E: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02)
==20782==    by 0xC80181E: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02)
==20782==    by 0xC61B70F: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02)
==20782==    by 0x550FD6E: ??? (in /hive/lib/libcudart.so.11.2.152)
==20782==    by 0x5511E70: ??? (in /hive/lib/libcudart.so.11.2.152)
==20782==    by 0x52DD906: __pthread_once_slow (pthread_once.c:116)
==20782==    by 0x5552AF8: ??? (in /hive/lib/libcudart.so.11.2.152)
==20782==    by 0x550C58F: ??? (in /hive/lib/libcudart.so.11.2.152)
==20782==    by 0x55158BF: ??? (in /hive/lib/libcudart.so.11.2.152)
==20782==    by 0x553BD87: cudaDriverGetVersion (in /hive/lib/libcudart.so.11.2.152)
==20782==    by 0x11BCBB: cuda_num_devices() (in /hive/miners/custom/radiator/ccminer)
==20782==  Uninitialised value was created by a heap allocation
==20782==    at 0x4C31A3F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20782==    by 0x4C33D84: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20782==    by 0xC7F3B1D: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02)
==20782==    by 0xC5ACA03: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02)
==20782==    by 0xC7FBB2E: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02)
==20782==    by 0xC80181E: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02)
==20782==    by 0xC61B70F: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.510.85.02)
==20782==    by 0x550FD6E: ??? (in /hive/lib/libcudart.so.11.2.152)
==20782==    by 0x5511E70: ??? (in /hive/lib/libcudart.so.11.2.152)
==20782==    by 0x52DD906: __pthread_once_slow (pthread_once.c:116)
==20782==    by 0x5552AF8: ??? (in /hive/lib/libcudart.so.11.2.152)
==20782==    by 0x550C58F: ??? (in /hive/lib/libcudart.so.11.2.152)
==20782==
==20782== Warning: noted but unhandled ioctl 0x30000001 with no size/direction hints.
==20782==    This could cause spurious value errors to appear.
==20782==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==20782== Warning: noted but unhandled ioctl 0x27 with no size/direction hints.
==20782==    This could cause spurious value errors to appear.
==20782==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==20782== Warning: noted but unhandled ioctl 0x25 with no size/direction hints.
==20782==    This could cause spurious value errors to appear.
==20782==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==20782== Warning: noted but unhandled ioctl 0x17 with no size/direction hints.
==20782==    This could cause spurious value errors to appear.
==20782==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==20782== Warning: set address range perms: large range [0x200000000, 0x600200000) (noaccess)
==20782== Warning: set address range perms: large range [0xdc0c000, 0x2dc0b000) (noaccess)
[2022-09-13 01:08:04] Starting Stratum on stratum+tcp://us-east.deepfields.io:7086
[2022-09-13 01:08:05] NVML GPU monitoring enabled.
==20782== Invalid write of size 4
==20782==    at 0x10D123: main (in /hive/miners/custom/radiator/ccminer)
==20782==  Address 0xc207160 is 0 bytes after a block of size 3,136 alloc'd
==20782==    at 0x4C33B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20782==    by 0x10CC49: main (in /hive/miners/custom/radiator/ccminer)
==20782==
==20782== Invalid write of size 8
==20782==    at 0x10D12D: main (in /hive/miners/custom/radiator/ccminer)
==20782==  Address 0xc207170 is 16 bytes after a block of size 3,136 alloc'd
==20782==    at 0x4C33B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20782==    by 0x10CC49: main (in /hive/miners/custom/radiator/ccminer)
==20782==
==20782== Invalid write of size 8
==20782==    at 0x52D5E5B: pthread_create@@GLIBC_2.2.5 (pthread_create.c:739)
==20782==    by 0x10D147: main (in /hive/miners/custom/radiator/ccminer)
==20782==  Address 0xc207168 is 8 bytes after a block of size 3,136 alloc'd
==20782==    at 0x4C33B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20782==    by 0x10CC49: main (in /hive/miners/custom/radiator/ccminer)
==20782==
[2022-09-13 01:08:05] 4 miner threads started, using 'rad' algorithm.
[2022-09-13 01:08:05] Stratum difficulty set to 2
[2022-09-13 01:08:05] Received new rad block header
[2022-09-13 01:08:05] block height 37733, 0 transactions
==20782== Warning: noted but unhandled ioctl 0x19 with no size/direction hints.
==20782==    This could cause spurious value errors to appear.
==20782==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==20782== Warning: noted but unhandled ioctl 0x49 with no size/direction hints.
==20782==    This could cause spurious value errors to appear.
==20782==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==20782== Warning: noted but unhandled ioctl 0x21 with no size/direction hints.
==20782==    This could cause spurious value errors to appear.
==20782==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==20782== Warning: noted but unhandled ioctl 0x1b with no size/direction hints.
==20782==    This could cause spurious value errors to appear.
==20782==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==20782== Warning: noted but unhandled ioctl 0x44 with no size/direction hints.
==20782==    This could cause spurious value errors to appear.
==20782==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==20782== Warning: noted but unhandled ioctl 0x48 with no size/direction hints.
==20782==    This could cause spurious value errors to appear.
==20782==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
[2022-09-13 01:08:09] GPU #3: using default intensity 28
[2022-09-13 01:08:10] GPU #1: using default intensity 28
[2022-09-13 01:08:10] GPU #2: using default intensity 28
[2022-09-13 01:08:10] GPU #0: using default intensity 28
^C==20782==
==20782== Process terminating with default action of signal 2 (SIGINT)
==20782==    at 0x52D6D2D: __pthread_timedjoin_ex (pthread_join_common.c:89)
==20782==    by 0x10D58C: main (in /hive/miners/custom/radiator/ccminer)
==20782==
==20782== HEAP SUMMARY:
==20782==     in use at exit: 12,089,871 bytes in 13,217 blocks
==20782==   total heap usage: 17,194 allocs, 3,977 frees, 66,979,275 bytes allocated
==20782==
==20782== LEAK SUMMARY:
==20782==    definitely lost: 211 bytes in 5 blocks
==20782==    indirectly lost: 0 bytes in 0 blocks
==20782==      possibly lost: 75,680 bytes in 987 blocks
==20782==    still reachable: 12,013,980 bytes in 12,225 blocks
==20782==         suppressed: 0 bytes in 0 blocks
==20782== Rerun with --leak-check=full to see details of leaked memory
==20782==
==20782== For counts of detected and suppressed errors, rerun with: -v
==20782== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0)
radifier commented 2 years ago

Resolved in v1.0.0