trexminer / T-Rex

T-Rex NVIDIA GPU miner with web control monitoring page
2.64k stars 438 forks source link

t-rex does not work under WSL #663

Open stevekstevek opened 3 years ago

stevekstevek commented 3 years ago

Hi -- this is actually the same issue described in issue #269

I'm trying to run the miner directly from the command-line, and get the same memory dump.

As requested in issue #269 here's how the libraries look under WSL:

~/m/t-rex-0.23.2$ ls -l /usr/lib/wsl/lib/|grep cuda -r-xr-xr-x 1 root root 141464 Aug 27 09:54 libcuda.so -r-xr-xr-x 1 root root 141464 Aug 27 09:54 libcuda.so.1 -r-xr-xr-x 1 root root 141464 Aug 27 09:54 libcuda.so.1.1

~/m/t-rex-0.23.2$ ls -lah /usr/lib/x86_64-linux-gnu/ | grep cuda lrwxrwxrwx 1 root root 18 Mar 16 2020 libicudata.so.60 -> libicudata.so.60.2 -rw-r--r-- 1 root root 26M Mar 16 2020 libicudata.so.60.2

note: /usr/lib/wsl/lib is in the ld configs, so unless you guys are doing something weird to load libraries, you should be finding it.

stevekstevek commented 2 years ago

This is still an issue.

Here's the details of my rig/libraries: ~/m/t-rex-0.24.5$ nvidia-smi Mon Nov 1 17:32:46 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 495.29.05 Driver Version: 496.13 CUDA Version: 11.5 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:2E:00.0 Off | N/A | | 70% 68C P2 168W / 200W | 3029MiB / 8192MiB | N/A Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

Trying to start, I get: $ ./t-rex -P --log-path trex.log 20211101 17:33:43 ERROR: Can't start T-Rex, CUDA initialize error, memory dump: (XXX) 20211101 17:33:44 T-Rex finished.

I've tried to run it with strace, to see what it's trying to do to initialize CUDA. I suspect it may be looking for older libraries than I have, or maybe looking for the device in a place it isn't. Unfortunately, t-rex seems to have some code in it that causes it to go down a different code path when run under strace, so it's very difficult to diagnose.

stevekstevek commented 2 years ago

I was able to get it running. I seem to have needed to explicitly pass along library paths in order to do so, however:

LD_LIBRARY_PATH=/usr/lib/wsl/lib:/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs ./t-rex

Also, it can't find libnvidia-ml.so, so I needed to make a symlink from that to libnvidia-ml.so.1.

Now it starts/runs: debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0 20211109 19:18:19 T-Rex NVIDIA GPU miner v0.24.5 - [Linux] 20211109 19:18:19 r.3ed63f02e8cb 20211109 19:18:19 20211109 19:18:20 WARN: can't get vendor and device identifiers for GPU id=ffe43a9f634889a1aa022d8a419f93eb 20211109 19:18:36 20211109 19:18:36 NVIDIA Driver v496.13 20211109 19:18:36 20211109 19:18:36 + GPU #0: [00:2e.0|0] GeForce RTX 3060 Ti, 8191 MB 20211109 19:18:36 20211109 19:18:36 WARN: DevFee 1% (ethash) 20211109 19:18:36

But it doesn't seem to be performing well:

----------20211109 19:21:44 ---------- Mining at localhost:2020, diff: 8.73 G GPU #0: RTX 3060 Ti - 1.82 MH/s, [T:63C, P:89W, F:51%, E:20.4kH/W] Shares/min: 0 Uptime: 3 mins 25 secs | Algo: ethash | T-Rex v0.24.5

20211109 19:21:50 ethash epoch: 452, block: 13586089, diff: 8.73 G 20211109 19:21:51 ethash epoch: 452, block: 13586090, diff: 8.73 G 20211109 19:22:03 ethash epoch: 452, block: 13586091, diff: 8.73 G

Next is to figure that out. I am able to do ~94Mh/s using the open-source autolykos client under the same conditions, so I hope there's just some tuning to work out.

philsward commented 2 years ago

1) Check to see if /dev/dxg exists. If not, get a newer WSL kernel [wsl --update] from command line. ex: v5.10.60.1+ works 2) Need to have a fairly newer version of the Nvidia driver installed. 511.x for example is known to be working 3) Check to see if drivers can be seen by WSL by using nvidia-smi command 4) Edit your .bashrc file and add at the bottom:

export LD_LIBRARY_PATH="/usr/lib/wsl/lib/:$LD_LIBRARY_PATH"

This should get rid of the memorydump error, however you will now get: ERROR: Can't start T-Rex, GPU memory clock control is not supported on Linux if you have certain items overclocked (most t-rex overclocking apparently only works in windows)

This is as far as I've got so far. Internets suggest using nvidia-smi to manually overclock but I haven't dove into it yet.