Multiple Jobs Stuck on Same Device

gordan-bobic commented 6 years ago

It still happens that more than one job gets stuck running on the same device. A hard highlander solution (there can be only one) is required to prevent this.

gordan-bobic commented 6 years ago

2018-03-25 17:58:50: [info] device 8 [1080tiasus]: changing algorithm to x11gost with ccminer2 [pool=nicehash] [profile=1080tiasus] [region=eu]
2018-03-25 17:58:54: [error] ethminer: failed to execute method {'params': ['14'], 'id': 1, 'method': 'algorithm.remove'}: algorithm not found
2018-03-25 17:58:54: [error] ethminer: failed to stop on device 8
2018-03-25 17:59:36: [info] device 8 [1080tiasus]: most profitable is now: ethermine/daggerhashimoto in region: None using ethminer
2018-03-25 18:00:20: [info] device 8 [1080tiasus]: ethermine/daggerhashimoto[ethminer]: 13.96 MH/s
2018-03-25 18:00:20: [warning] device 8 [1080tiasus]: hashrate 1% below calibrated rate [13.96 MH/s < 34.08 MH/s]

So the problem is that the call to terminate a miner fails. Possibly an excavataur bug.

gordan-bobic commented 6 years ago

Doesn't look like an excavataur bug. It looks like excavataur never received/acknowledged the command to tear down the worker. So this seems like a key bit from the log:

2018-03-25 17:58:54: [error] ethminer: failed to execute method {'params': ['14'], 'id': 1, 'method': 'algorithm.remove'}: algorithm not found

gordan-bobic commented 6 years ago

This issue seems to be particularly evident with excavator, much more so than when using excavataur's external miners.

There really needs to be much more thorough policing, serialization and retries when tearing down and creating excavator jobs.

Excavator is also much more of an IRQ hog that interferes with nvidia device probing.

m4rkw / minotaur

Multiple Jobs Stuck on Same Device #20