nicehash / excavator

NiceHash's proprietary low-level CUDA miner
https://www.nicehash.com
53 stars 19 forks source link

on_quit does not work when working with NHM #312

Closed EvgeniyKorepov closed 3 years ago

EvgeniyKorepov commented 3 years ago

Probably on_quit does not work when working with NHM.

I configured the work of 3070 cards in HHM with a custom config (my own overclocking). When exiting, the Excavator must restore overclocking suitable for other algorithms. But when switching algorithms, for example to Octopus, overclocking from on_quit is not applied.

Is it possible that the NHM terminates the Exavator incorrectly - just kills the process?

Here is my config:

[
    {"time":0,"commands":[
        {"id":1,"method":"subscribe","params":["nhmp-ssl.eu-west.nicehash.com:443","382bkNMNYWRvWKMQ4K25XMraJcybRf8qsv$0-Rzuh7s5UO161g0nP93ic0g"]}
    ]},
    {"time":1,"commands":[
        {"id":1,"method":"algorithm.add","params":["daggerhashimoto"]}
    ]},
    {"time":2,"commands":[
        {"id":3,"method":"worker.add","params":["daggerhashimoto","GPU-f8467b26-8af1-714d-b4e6-c149281c1795"]},
        {"id":3,"method":"worker.add","params":["daggerhashimoto","GPU-c8c1d3d3-8fda-935d-c409-d863f56ffc71"]}
    ]},
    {"time":1,"commands":[
        {"id":1,"method":"device.set.oc_profile","params":["GPU-f8467b26-8af1-714d-b4e6-c149281c1795","-500","1400","125"]},
        {"id":1,"method":"device.set.oc_profile","params":["GPU-c8c1d3d3-8fda-935d-c409-d863f56ffc71","-500","1300","125"]},
        {"id":1,"method":"workers.reset","params":["0","1"]}
    ]},
    {"time":15,"loop":10,"commands":[
        {"id":1,"method":"worker.print.efficiencies","params":[]},
        {"id":1,"method":"algorithm.print.speeds","params":[]},
        {"id":1,"method":"worker.reset","params":["0"]},
        {"id":1,"method":"worker.reset","params":["1"]}
    ]},
    {"event":"on_quit","commands":[
        {"id":1,"method":"device.set.oc_profile","params":["GPU-f8467b26-8af1-714d-b4e6-c149281c1795","200","1200","190"]},
        {"id":1,"method":"device.set.oc_profile","params":["GPU-c8c1d3d3-8fda-935d-c409-d863f56ffc71","200","1200","190"]}
    ]}
]

Of course, it would be necessary to turn on the logging of the Excavator and see what is happening, but I can not find the level of debugging - the excavator either does not write almost anything, or writes a huge number of logs that are difficult to analyze))

Or is it better to address this question to NHM developers?

nicehashdev commented 3 years ago

Possibly. I would suggest you open this issue in NiceHash Miner repository.

EvgeniyKorepov commented 3 years ago

Possibly. I would suggest you open this issue in NiceHash Miner repository.

https://github.com/nicehash/NiceHashMiner/issues/2380

EvgeniyKorepov commented 3 years ago

The NHM developers have done an excellent job. But unfortunately, the problem turned out to be in the excavator (

config cmd_3.json:

[
  {
    "time": 0,
    "commands": [
      {
        "id": 1,
        "method": "subscribe",
        "params": [
          "nhmp-ssl.eu-west.nicehash.com:443",
          "382bkNMNYWRvWKMQ4K25XMraJcybRf8qsv$0-Rzuh7s5UO161g0nP93ic0g"
        ]
      },
      {
        "id": 2,
        "method": "algorithm.add",
        "params": [
          "daggerhashimoto"
        ]
      },
      {
        "id": 3,
        "method": "worker.add",
        "params": [
          "daggerhashimoto",
          "GPU-e9846d6a-d476-d4ea-cc55-4e34a887970c"
        ]
      }
    ]
  },
  {
    "time": 1,
    "commands": [
      {
        "id": 3455632,
        "method": "device.set.oc_profile",
        "params": [
          "GPU-e9846d6a-d476-d4ea-cc55-4e34a887970c",
          "12",
          "900",
          "165"
        ]
      },
      {
        "id": 1,
        "method": "worker.reset.device",
        "params": [
          "GPU-e9846d6a-d476-d4ea-cc55-4e34a887970c"
        ]
      }
    ]
  },
  {
    "time": 15,
    "loop": 10,
    "commands": [
      {
        "id": 1,
        "method": "worker.print.efficiencies",
        "params": []
      },
      {
        "id": 1,
        "method": "algorithm.print.speeds",
        "params": []
      },
      {
        "id": 1,
        "method": "worker.reset.device",
        "params": [
          "GPU-e9846d6a-d476-d4ea-cc55-4e34a887970c"
        ]
      }
    ]
  },
  {
    "event": "on_quit",
    "commands": [
      {
        "id": 3455632,
        "method": "device.set.oc_profile",
        "params": [
          "GPU-e9846d6a-d476-d4ea-cc55-4e34a887970c",
          "12",
          "910",
          "170"
        ]
      }
    ]
  }
]

command: excavator.exe -fn C:\Mining\NHM3\logs\excavator.log -f 2 -c cmd_3.json

At the end of the excavator I see:

[[2021-03-28 13:11:25.543765] [thread=0x00002788] [info]]
    core | Quit requested (app close)
[[2021-03-28 13:11:25.544779] [thread=0x00001f44] [info]]
    Shutting down
[[2021-03-28 13:11:25.548845] [thread=0x00001f44] [info]]
    device #1 | TDP set to 67%
[[2021-03-28 13:11:25.741724] [thread=0x00001f44] [info]]
    device #1 | Memory clock delta set to 910
[[2021-03-28 13:11:25.804233] [thread=0x00001f44] [info]]
    http | Closing
[[2021-03-28 13:11:25.996805] [thread=0x00001d10] [info]]
    device #1 | TDP set to 65%
[[2021-03-28 13:11:26.151502] [thread=0x00001d10] [info]]
    device #1 | Memory clock delta set to 0
[[2021-03-28 13:11:26.377329] [thread=0x00001d10] [info]]
    device #1 | Core clock delta set to 0

device #1 | Memory clock delta set to 0 device #1 | Core clock delta set to 0 instead device #1 | Memory clock delta set to 910 device #1 | Core clock delta set to 12

I've tried several recent excavator builds, including the timing build - the output behavior is the same everywhere.

EvgeniyKorepov commented 3 years ago

It seems that in the end some kind of debug stub is triggered, resetting the frequencies to zero

[[2021-03-28 13:11:26.151502] [thread=0x00001d10] [info]]
    device #1 | Memory clock delta set to 0
[[2021-03-28 13:11:26.377329] [thread=0x00001d10] [info]]
    device #1 | Core clock delta set to 0
nicehashdev commented 3 years ago

If you are setting clocks with profile method, then these are reset back to "pre-profile" values. Set OC without using profiles if you wish to keep OC values after exiting Excavator.

EvgeniyKorepov commented 3 years ago

So I need overclocking with? : device.set.tdp device.set.core_delta device.set.memory_delta

But then it won't work "memory is reverted back to 0 delta memory clock for the time of DAG generation"?

pre-profile: image

Do the parameters return to 0, or to acceleration before starting the excavator?

EvgeniyKorepov commented 3 years ago

I tried using (with this method, there really is no unnecessary reset of overclocking to zero)

{"id":1,"method":"device.set.core_delta","params":["GPU-f8467b26-8af1-714d-b4e6-c149281c1795","-500"]},
{"id":1,"method":"device.set.memory_delta","params":["GPU-f8467b26-8af1-714d-b4e6-c149281c1795","1400"]},
{"id":1,"method":"device.set.power_limit","params":["GPU-f8467b26-8af1-714d-b4e6-c149281c1795","125"]},

instead

{"id":1,"method":"device.set.oc_profile","params":["GPU-f8467b26-8af1-714d-b4e6-c149281c1795","-500","1400","125"]},

but unfortunately I get an error : cuda-daggerhashimoto Device #1 | Invalid DAG generated. Is your memory OC too high?

And what about the undocumented method device.set.oc_profile2 ?

EvgeniyKorepov commented 3 years ago

The release https://github.com/nicehash/excavator/releases/tag/v1.6.11a seems to fix the problem with "Invalid DAG generated. Is your memory OC too high?"

nicehashdev commented 3 years ago

Considering you are using 2070 SUPER, I would suggest using device.set.oc_profile2 which takes in one less parameter (first is GPU ID, second is max core clock and third is absolute memory clock) - check OCTune how it works - code is in javascript, easy to understand. I didn't have enough time to write docs for all yet.

But if you wish to preserve OC when Excavator exits, you should not use profile (or your OC should be set BEFORE applying profile, so it remembers previous OC state) but instead using single methods as you already figured out.

Regarding invalid DAG; that is the problem - memory OC shall not be applied when DAG is generated as there is high chance of data corruption if mem OC is not fully stable. If Excavator is run as Administrator (which should be, if you want to apply clocks), then it would automatically reduce OC to 0 during DAG gen phase and then restore it back up. You can disable this feature with -g command line switch.