Closed mably closed 5 years ago
What is status of latest C31 plugins on your cards?
It's ok now, I close the issue.
@tromp Sorry to reopen the issue, I just updated my C31 GPS for the V100 here: https://github.com/mimblewimble/docs/wiki/GPU-Mining-Stats
Does 0.18 GPS on C31 seem right for the Tesla v100? (which is 1/10th the RTX2080 when they get the same GPS for C29)
Is it possible my plugins are configured incorrectly? Couldn't get any C31 plugin to work except the ocl and lean-cuda (the rtx and gtx plugins errored)
0.18 GPS on C31 seems reasonable for cuckatoo_lean_cuda_31
But why couldn't you get cuckatoo_mean_cuda_gtx_31 to work? What was the error?
Thank you for following up @tromp!
Error when attempting the cuckatoo_mean_cuda_gtx_31 plugin is:
ERRO Plugin cuckatoo_mean_cuda_gtx_31 has errored, device: Tesla V100-SXM2-16GB. Reason: Device 0 GPUassert: out of memory /home/travis/build/mimblewimble/grin-miner/cuckoo-miner/src/cuckoo_sys/plugins/cuckoo/src/cuckatoo/mean.cu 429
Here's the rest of the grin-miner.log:
Jan 09 10:39:49.313 DEBG sending request: {"id":"0","jsonrpc":"2.0","method":"getjobtemplate","params":null}
Jan 09 10:39:50.001 DEBG Received message: {"id":"0","jsonrpc":"2.0","method":"login","result":"ok"}
Jan 09 10:39:50.001 DEBG Received response with id: 0
Jan 09 10:39:50.001 DEBG Received message: {"id":"0","jsonrpc":"2.0","method":"getjobtemplate","result":{"difficulty":1,"height":16654,"job_id":1,"pre_pow":"0001000000000000410e000000005c35cf70006e16d49719bc759cfd53c46fb7e796700308048712ed8fabc374d64ec39552c3ddcf62df7b0f6f5c81d84df278cb9498b0250e54fb1ebce5f3ecb281b95a874e9ed224a7ac4a96520a0e6a20548e37b4b63604d23cc128e75371a3f0f5e8cacceac3e537bc45674fe19969ace0b2c8f7e807987da18393b5e3d6cd6dceff7b2b07553f087aef10ccae7b5515e48ca7a6c32b814daa4a4dfffe30830df50662ac2486491e1629e70feba2aa17f73573f4b1b3f7f4c85931c7620cfacf16386f0000000000017dbf000000000001008800000000620174db0000049a"}}
Jan 09 10:39:50.001 DEBG Received response with id: 0
Jan 09 10:39:50.001 INFO Got a job at height 16654 and difficulty 1
Jan 09 10:39:50.043 DEBG Miner received message: ReceivedJob(16654, 1, 1, "0001000000000000410e000000005c35cf70006e16d49719bc759cfd53c46fb7e796700308048712ed8fabc374d64ec39552c3ddcf62df7b0f6f5c81d84df278cb9498b0250e54fb1ebce5f3ecb281b95a874e9ed224a7ac4a96520a0e6a20548e37b4b63604d23cc128e75371a3f0f5e8cacceac3e537bc45674fe19969ace0b2c8f7e807987da18393b5e3d6cd6dceff7b2b07553f087aef10ccae7b5515e48ca7a6c32b814daa4a4dfffe30830df50662ac2486491e1629e70feba2aa17f73573f4b1b3f7f4c85931c7620cfacf16386f0000000000017dbf000000000001008800000000620174db0000049a")
Jan 09 10:39:50.043 DEBG Pause message sent
Jan 09 10:39:50.043 DEBG Resume message sent
Jan 09 10:39:50.043 DEBG solver_thread - solver_loop_rx got msg: Pause
Jan 09 10:39:50.044 DEBG solver_thread - solver_loop_rx got msg: Resume
Jan 09 10:39:50.044 ERRO Plugin cuckatoo_mean_cuda_gtx_31 has errored, device: Tesla V100-SXM2-16GB. Reason: Device 0 GPUassert: out of memory /home/travis/build/mimblewimble/grin-miner/cuckoo-miner/src/cuckoo_sys/plugins/cuckoo/src/cuckatoo/mean.cu 429
Jan 09 10:39:51.045 DEBG Mining: Plugin 0 - Device 0 (Tesla V100-SXM2-16GB) Has ERRORED! Reason: Device 0 GPUassert: out of memory /home/travis/build/mimblewimble/grin-miner/cuckoo-miner/src/cuckoo_sys/plugins/cuckoo/src/cuckatoo/mean.cu 429
Jan 09 10:39:51.045 INFO Mining: Cuck(at)oo at 0 gps (graphs per second)
Jan 09 10:39:54.048 DEBG Mining: Plugin 0 - Device 0 (Tesla V100-SXM2-16GB) Has ERRORED! Reason: Device 0 GPUassert: out of memory /home/travis/build/mimblewimble/grin-miner/cuckoo-miner/src/cuckoo_sys/plugins/cuckoo/src/cuckatoo/mean.cu 429
Jan 09 10:39:54.048 INFO Mining: Cuck(at)oo at 0 gps (graphs per second)
Jan 09 10:39:57.052 DEBG Mining: Plugin 0 - Device 0 (Tesla V100-SXM2-16GB) Has ERRORED! Reason: Device 0 GPUassert: out of memory /home/travis/build/mimblewimble/grin-miner/cuckoo-miner/src/cuckoo_sys/plugins/cuckoo/src/cuckatoo/mean.cu 429
Jan 09 10:39:57.052 INFO Mining: Cuck(at)oo at 0 gps (graphs per second)
Jan 09 10:40:00.000 DEBG Received message: {"id":"Stratum","jsonrpc":"2.0","method":"job","params":{"difficulty":1,"height":16654,"job_id":2,"pre_pow":"0001000000000000410e000000005c35cf7f006e16d49719bc759cfd53c46fb7e796700308048712ed8fabc374d64ec39552c3ddcf62df7b0f6f5c81d84df278cb9498b0250e54fb1ebce5f3ecb281b95a87cee302884f7a6b397320439ea0ed3eaa2904dfbca41e2ae3ef855a227b764a8dedcbd08f1076c3cb82c1270629019c7fad17aeda3acdb6eb146b3e4c10b6bf3b95e1c2e479f4ff8a3f2a25a7f2bdfbc7b286dbcdf2b3b75965152e79ddf0762db03f47977eeaf1c80d992cd316bc04b6e56943abad4d649c725a5b715faea25f0000000000017dc2000000000001008900000000620174db0000049a"}}
Jan 09 10:40:00.000 DEBG Received request type: job
Jan 09 10:40:00.000 INFO Got a new job: JobTemplate { height: 16654, job_id: 2, difficulty: 1, pre_pow: "0001000000000000410e000000005c35cf7f006e16d49719bc759cfd53c46fb7e796700308048712ed8fabc374d64ec39552c3ddcf62df7b0f6f5c81d84df278cb9498b0250e54fb1ebce5f3ecb281b95a87cee302884f7a6b397320439ea0ed3eaa2904dfbca41e2ae3ef855a227b764a8dedcbd08f1076c3cb82c1270629019c7fad17aeda3acdb6eb146b3e4c10b6bf3b95e1c2e479f4ff8a3f2a25a7f2bdfbc7b286dbcdf2b3b75965152e79ddf0762db03f47977eeaf1c80d992cd316bc04b6e56943abad4d649c725a5b715faea25f0000000000017dc2000000000001008900000000620174db0000049a" }
Jan 09 10:40:00.056 DEBG Miner received message: ReceivedJob(16654, 2, 1, "0001000000000000410e000000005c35cf7f006e16d49719bc759cfd53c46fb7e796700308048712ed8fabc374d64ec39552c3ddcf62df7b0f6f5c81d84df278cb9498b0250e54fb1ebce5f3ecb281b95a87cee302884f7a6b397320439ea0ed3eaa2904dfbca41e2ae3ef855a227b764a8dedcbd08f1076c3cb82c1270629019c7fad17aeda3acdb6eb146b3e4c10b6bf3b95e1c2e479f4ff8a3f2a25a7f2bdfbc7b286dbcdf2b3b75965152e79ddf0762db03f47977eeaf1c80d992cd316bc04b6e56943abad4d649c725a5b715faea25f0000000000017dc2000000000001008900000000620174db0000049a")
Jan 09 10:40:00.056 DEBG Mining: Plugin 0 - Device 0 (Tesla V100-SXM2-16GB) Has ERRORED! Reason: Device 0 GPUassert: out of memory /home/travis/build/mimblewimble/grin-miner/cuckoo-miner/src/cuckoo_sys/plugins/cuckoo/src/cuckatoo/mean.cu 429
Jan 09 10:40:00.056 INFO Mining: Cuck(at)oo at 0 gps (graphs per second)
Jan 09 10:40:03.059 DEBG Mining: Plugin 0 - Device 0 (Tesla V100-SXM2-16GB) Has ERRORED! Reason: Device 0 GPUassert: out of memory /home/travis/build/mimblewimble/grin-miner/cuckoo-miner/src/cuckoo_sys/plugins/cuckoo/src/cuckatoo/mean.cu 429
Jan 09 10:40:03.059 INFO Mining: Cuck(at)oo at 0 gps (graphs per second)
Jan 09 10:40:06.063 DEBG Mining: Plugin 0 - Device 0 (Tesla V100-SXM2-16GB) Has ERRORED! Reason: Device 0 GPUassert: out of memory /home/travis/build/mimblewimble/grin-miner/cuckoo-miner/src/cuckoo_sys/plugins/cuckoo/src/cuckatoo/mean.cu 429
Jan 09 10:40:06.063 INFO Mining: Cuck(at)oo at 0 gps (graphs per second)
Jan 09 10:40:09.067 DEBG Mining: Plugin 0 - Device 0 (Tesla V100-SXM2-16GB) Has ERRORED! Reason: Device 0 GPUassert: out of memory /home/travis/build/mimblewimble/grin-miner/cuckoo-miner/src/cuckoo_sys/plugins/cuckoo/src/cuckatoo/mean.cu 429
Jan 09 10:40:09.067 INFO Mining: Cuck(at)oo at 0 gps (graphs per second)
Jan 09 10:40:12.070 DEBG Mining: Plugin 0 - Device 0 (Tesla V100-SXM2-16GB) Has ERRORED! Reason: Device 0 GPUassert: out of memory /home/travis/build/mimblewimble/grin-miner/cuckoo-miner/src/cuckoo_sys/plugins/cuckoo/src/cuckatoo/mean.cu 429
Jan 09 10:40:12.070 INFO Mining: Cuck(at)oo at 0 gps (graphs per second)
Jan 09 10:40:14.025 DEBG Client received message: Shutdown
Jan 09 10:40:14.025 DEBG Shutting down client controller
Jan 09 10:40:14.073 DEBG Miner received message: Shutdown
Jan 09 10:40:14.073 DEBG Stopping jobs and Shutting down mining controller
Jan 09 10:40:14.073 DEBG Stop message sent
Jan 09 10:40:14.189 DEBG Solver stopped: 0
Not sure why it runs out of memory. nvidia-smi
shows 16gb, none of which are in use after the miner is shut down.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48 Driver Version: 410.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:00:1E.0 Off | 0 |
| N/A 31C P0 36W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Please let me know if you need me to run anything else, and thanks again.
my repo has a little utility called cumal.cu in src/cuckatoo can you run make cumal and run it on your device to see what is the max memory you can allocate?
I'm on it.
Just to clarify (sorry, inexperienced):
git clone https://github.com/tromp/cuckoo.git
cd cuckoo/src/cuckatoo
make cumal
And then run ./cumal
?
And then run ./cumal?
That's what you normally do with executables:-)
Roger :) Is this right then? Thought it likely I'm screwing something up:
ubuntu@ip-172-31-4-191:~$ git clone https://github.com/tromp/cuckoo.git
Cloning into 'cuckoo'...
remote: Enumerating objects: 120, done.
remote: Counting objects: 100% (120/120), done.
remote: Compressing objects: 100% (66/66), done.
remote: Total 4083 (delta 86), reused 83 (delta 54), pack-reused 3963
Receiving objects: 100% (4083/4083), 12.79 MiB | 32.81 MiB/s, done.
Resolving deltas: 100% (2846/2846), done.
ubuntu@ip-172-31-4-191:~$ cd cuckoo/src/cuckatoo
ubuntu@ip-172-31-4-191:~/cuckoo/src/cuckatoo$ make cumal
nvcc -std=c++11 -o cumal cumal.cu
cumal.cu: In function ‘int main(int, char**)’:
cumal.cu:21:67: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘size_t {aka long unsigned int}’ [-Wformat=]
if (ret) printf("cudaMalloc(%d MB) returned %d\n", bufferMB, ret);
^
cumal.cu:24:52: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘size_t {aka long unsigned int}’ [-Wformat=]
printf("cudaMalloc(%d MB) succeeded %d\n", bufferMB);
^
cumal.cu:24:52: warning: format ‘%d’ expects a matching ‘int’ argument [-Wformat=]
ubuntu@ip-172-31-4-191:~/cuckoo/src/cuckatoo$ ./cumal
cumal: cumal.cu:11: int main(int, char**): Assertion `device < nDevices' failed.
Aborted (core dumped)
ubuntu@ip-172-31-4-191:~/cuckoo/src/cuckatoo$
Okay! Issue was by using the cuckatoo_mean_cuda_gtx_31 plugin with expand = 2 uncommented. Thank you sincerely @tromp for spending the past few hours looking at this.
You can also the edit NEPS_A and NEPS_B values to 133 / 88 respectively in cuckoo-miner/src/cuckoo_sys/plugins/CMakeLists.txt for a significant GPS increase to eliminate the slight loss in solutions needed to fit 11gb as follows:
build_cuda_target("${AT_MEAN_CUDA_SRC}" cuckatoo_mean_cuda_gtx_31 "-DNEPS_A=133 -DNEPS_B=88 -DPART_BITS=1 -DEDGEBITS=31")
See here: https://github.com/mimblewimble/docs/wiki/GPU-Mining-Stats
Here are the logs we get when running the cuda31 utility program with
-E 2 -r 100
parameters:Tesla P100: https://pastebin.com/M3Gnpi7t Tesla V100: https://pastebin.com/HLUngKbQ