trexminer / T-Rex

T-Rex NVIDIA GPU miner with web control monitoring page
2.64k stars 439 forks source link

No disable fan check option on Tesla Nvidia card mining. #35

Closed mechanator closed 5 years ago

mechanator commented 5 years ago

There are several Tesla model cards like the M2050, M2070, K10, and K20, that are passive cooled in a rack or have their own fans attached. They don't report back a call to the query for fan speed. When T-rex is launched with CUDA 9 or 10 in Windows 10 x64. There is no response to the fan check query through the driver. Mining is stopped since it can't read the fan rpm.

However, other miner applications out there do have an ignore option for this. Assuming the user is fully understanding that they can monitor their own cards. Also, some of the older Fermi Tesla cards don't even report temperature. I did check to see if the --temperature-limit 0 works. But it won't get to that point since the miner app wants an rpm count from nvsmi first. But it would also be nice to disable that. Some of us miners already have the cards in a wind tunnel or oil bath cooling these cards with external hardware based controls that are outside of a software call. Also tried with --nowatchdog and --nonvml options. Didn't make a difference.

THX-Jedi commented 5 years ago

I hit the same problem yesterday

trexminer commented 5 years ago

Could you please attach a screenshot / log file that shows the error?

THX-Jedi commented 5 years ago

t-rex-[1567838329].log

TRex Screen 1

mechanator commented 5 years ago

I tried with both watchdog on and off, with and without nvml on. Devices 1 and 2 are a Tesla K10 card with 2 gpus and 4gb ram each.

trex_with_no_nvml

trexminer commented 5 years ago

Okay, looks like the problem lies elsewhere. T-Rex supports video cards of Compute Capability 5.0 and above whereas M2050, M2070, K10, and K20 are older generation: https://developer.nvidia.com/cuda-gpus

mechanator commented 5 years ago

OK, that is understandable, The K10 is compute 3.5 and the M2050/2070 are compute 2.0 engines. However, Nvidia might make a new version of a card without the fan speed being reported like say a rtx series passive cooled server card, and then it will still fail a fan speed check and stop the miner application. So maybe this issue should be changed to a feature request of disabling the fan speed check on a per card basis or carte blanche.
Just checked out the P100 series of cards, they are passively cooled and won't report a fan speed when queried. See page 5 here, https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/tesla-product-literature/NV-tesla-p100-pcie-PB-08248-001-v01.pdf

THX-Jedi commented 5 years ago

But while newer cards don't report fan speeds. Is it required by T-Rex to initialize the cards ?

If yes, we need am option to ignore the missing fan speeds

On a side note, is it too much to add in cuda support from (for example) 3.0 onwards, as from cuda 3.5 the original GTX Titan cards were released ?

trexminer commented 5 years ago

Fan speed reporting doesn't affect mining, it's the unsupported compute capability that causes the miner to shut down. Some of the newer RTX and P100's, as you mentioned above, indeed don't report fan speeds and the miner works fine with them.

I'm afraid adding CC 3.5+ support would be a lot of effort as most of the algorithms supported by t-rex are implemented using new instructions not available on older architectures.

mechanator commented 5 years ago

OK, close it and let the vintage tech get recycled eventually.