m4rkw / minotaur

MIT License
7 stars 4 forks source link

Multiple Jobs Stuck on Same Device #20

Open gordan-bobic opened 6 years ago

gordan-bobic commented 6 years ago

It still happens that more than one job gets stuck running on the same device. A hard highlander solution (there can be only one) is required to prevent this.

gordan-bobic commented 6 years ago
2018-03-25 17:58:50: [info] device 8 [1080tiasus]: changing algorithm to x11gost with ccminer2 [pool=nicehash] [profile=1080tiasus] [region=eu]
2018-03-25 17:58:54: [error] ethminer: failed to execute method {'params': ['14'], 'id': 1, 'method': 'algorithm.remove'}: algorithm not found
2018-03-25 17:58:54: [error] ethminer: failed to stop on device 8
2018-03-25 17:59:36: [info] device 8 [1080tiasus]: most profitable is now: ethermine/daggerhashimoto in region: None using ethminer
2018-03-25 18:00:20: [info] device 8 [1080tiasus]: ethermine/daggerhashimoto[ethminer]: 13.96 MH/s
2018-03-25 18:00:20: [warning] device 8 [1080tiasus]: hashrate 1% below calibrated rate [13.96 MH/s < 34.08 MH/s]

So the problem is that the call to terminate a miner fails. Possibly an excavataur bug.

gordan-bobic commented 6 years ago

Doesn't look like an excavataur bug. It looks like excavataur never received/acknowledged the command to tear down the worker. So this seems like a key bit from the log:

2018-03-25 17:58:54: [error] ethminer: failed to execute method {'params': ['14'], 'id': 1, 'method': 'algorithm.remove'}: algorithm not found
gordan-bobic commented 6 years ago

This issue seems to be particularly evident with excavator, much more so than when using excavataur's external miners.

There really needs to be much more thorough policing, serialization and retries when tearing down and creating excavator jobs.

Excavator is also much more of an IRQ hog that interferes with nvidia device probing.