Open StefanOberhumer opened 6 years ago
Could you send me a trace? ./grm -t -m -1 -p -1 -r 1 Juts let it run for 1 min
Here it is:
./grm -t -m -1 -p -1 -r 1
INFO: 2018/07/02 19:26:59 main.go:47: Starting rig-monitor version 3.0.d.1 ...
INFO: 2018/07/02 19:26:59 main.go:59: Tracing is enabled!
INFO: 2018/07/02 19:26:59 main.go:63: Commmand line arguments: -config
INFO: 2018/07/02 19:26:59 main.go:67: No config file specified. Using default config.toml file: config.toml
INFO: 2018/07/02 19:26:59 config.go:61: Reading configuration file...
TRACE: 2018/07/02 19:26:59 config.go:169: Config file: &config.Config{Grafana:config.GrafanaStruct{Username:"grafana", Password:"grafana"}, Influxdb:storage.InfluxDbStruct{Hostname:"http://localhost:8086", Database:"rigdata", Username:"grafana", Password:"grafana", WriteInterval:0x1e, InfluxWriteBufferSize:0x3e8}, Dynu:config.DynuStruct{Enabled:false, Username:"dynu", Password:"dynu", Hostname:"host.dynu.com", UpdateInterval:1800}, Main:config.Mainstruct{PoolWorkers:3, RigWorkers:5, PoolPollingInterval:300, RigPollingInterval:60, Rig:[]string{"rig01,ethminer,label_ethermine,,//192.168.XXX.YYY:3333,8,183,0,noplug,,640,70"}, Pool:[]string{"label_ethermine,ethermine,eth,https://api.ethermine.org,,MYWALLET,1"}, PowerRules:[]string{}}, Profitability:config.Profitabilitystruct{SmartPlugPollingInterval:60, MarketPollingInterval:300, QuoteCurrency:"EUR", PowerCostKwh:0.17, PowerRatioDualMining:0.3}, PoolList:[]pool.PoolConfig{pool.PoolConfig{Label:"label_ethermine_company", Type:"ETHERMINE", Crypto:"eth", URL:"https://api.ethermine.org", Token:"", Wallet:"MYWALLET", CorrectionFactor:1}}, RigList:[]miner.RigConfig{miner.RigConfig{RigName:"rig01", Miner:"ETHMINER", PoolLabel:"label_ethermine_company", PoolLabel2:"", URL:"//192.168.XXX.YYY:3333", InstalledGpus:8, TargetHashRate:183, TargetHashRate2:0, SmartPlugType:"", SmartPlugIP:"", PSUMaxPower:0, TargetTemperature:70}}, SmartPlugList:[]power.SmartPlugConfig{power.SmartPlugConfig{RigName:"rig01", Miner:"ETHMINER", PoolLabel:"label_ethermine_company", PoolLabel2:"", SmartPlugType:"NOPLUG", SmartPlugIP:"", PSUMaxPower:640, RuleList:[]power.PowerMgmtRuleStruct(nil), LastReset:0}}}
INFO: 2018/07/02 19:26:59 main.go:87: Commmand line arguments: -m -1
INFO: 2018/07/02 19:26:59 main.go:101: Commmand line arguments: -p -1
INFO: 2018/07/02 19:26:59 main.go:112: Commmand line arguments: -r 1
TRACE: 2018/07/02 19:26:59 main.go:125: List of rigs: []miner.RigConfig{miner.RigConfig{RigName:"rig01", Miner:"ETHMINER", PoolLabel:"label_ethermine_company", PoolLabel2:"", URL:"//192.168.XXX.YYY:3333", InstalledGpus:8, TargetHashRate:183, TargetHashRate2:0, SmartPlugType:"", SmartPlugIP:"", PSUMaxPower:0, TargetTemperature:70}}
TRACE: 2018/07/02 19:26:59 main.go:126: List of pools: []pool.PoolConfig{}
INFO: 2018/07/02 19:26:59 influxdb.go:26: Starting DBDaemon routine...
INFO: 2018/07/02 19:26:59 influxdb.go:94: Creating influxDB connection to http://localhost:8086 ...
INFO: 2018/07/02 19:27:00 main.go:151: Launching rig monitor worker: 0
INFO: 2018/07/02 19:27:00 main.go:151: Launching rig monitor worker: 1
INFO: 2018/07/02 19:27:00 main.go:151: Launching rig monitor worker: 2
INFO: 2018/07/02 19:27:00 main.go:151: Launching rig monitor worker: 3
INFO: 2018/07/02 19:27:00 main.go:151: Launching rig monitor worker: 4
INFO: 2018/07/02 19:27:00 main.go:160: Launching power monitor worker: 0
INFO: 2018/07/02 19:27:00 main.go:162: Launching power rules manager worker: 0
INFO: 2018/07/02 19:27:00 main.go:160: Launching power monitor worker: 1
INFO: 2018/07/02 19:27:00 main.go:162: Launching power rules manager worker: 1
INFO: 2018/07/02 19:27:00 main.go:160: Launching power monitor worker: 2
INFO: 2018/07/02 19:27:00 main.go:162: Launching power rules manager worker: 2
INFO: 2018/07/02 19:27:00 main.go:160: Launching power monitor worker: 3
INFO: 2018/07/02 19:27:00 main.go:162: Launching power rules manager worker: 3
INFO: 2018/07/02 19:27:00 main.go:160: Launching power monitor worker: 4
INFO: 2018/07/02 19:27:00 main.go:162: Launching power rules manager worker: 4
INFO: 2018/07/02 19:27:00 influxdb.go:94: Creating influxDB connection to http://localhost:8086 ...
INFO: 2018/07/02 19:27:00 power-monitor.go:25: New power monitor job received: rig01
INFO: 2018/07/02 19:27:00 influxdb.go:94: Creating influxDB connection to http://localhost:8086 ...
INFO: 2018/07/02 19:27:00 influxdb.go:94: Creating influxDB connection to http://localhost:8086 ...
INFO: 2018/07/02 19:27:00 influxdb.go:94: Creating influxDB connection to http://localhost:8086 ...
INFO: 2018/07/02 19:27:00 rig-monitor.go:21: New rig monitor job received: rig01
TRACE: 2018/07/02 19:27:00 rig-monitor.go:23: New rig monitor job received: &miner.RigConfig{RigName:"rig01", Miner:"ETHMINER", PoolLabel:"label_ethermine_company", PoolLabel2:"", URL:"//192.168.XXX.YYY:3333", InstalledGpus:8, TargetHashRate:183, TargetHashRate2:0, SmartPlugType:"", SmartPlugIP:"", PSUMaxPower:0, TargetTemperature:70}
INFO: 2018/07/02 19:27:00 influxdb.go:94: Creating influxDB connection to http://localhost:8086 ...
TRACE: 2018/07/02 19:27:00 power-monitor.go:27: New power monitor job received: &power.SmartPlugConfig{RigName:"rig01", Miner:"ETHMINER", PoolLabel:"label_ethermine_company", PoolLabel2:"", SmartPlugType:"NOPLUG", SmartPlugIP:"", PSUMaxPower:640, RuleList:[]power.PowerMgmtRuleStruct(nil), LastReset:0}
INFO: 2018/07/02 19:27:00 rig-monitor.go:82: Connection to rig rig01 OK.
TRACE: 2018/07/02 19:27:00 influxdb.go:54: New record received by DBDaemon: env_data map[label:label_ethermine_company plug_type:NOPLUG rig_id:rig01] map[max_power:640 power_usage:640] <nil> 2018-07-02 19:27:00.480742548 +0200 CEST m=+1.028741029
TRACE: 2018/07/02 19:27:20 functions.go:139: Alloc = 1 MiB TotalAlloc = 1 MiB Sys = 5 MiB NumGC = 0
TRACE: 2018/07/02 19:27:20 functions.go:147: Number of Go routines running: 20
TRACE: 2018/07/02 19:27:20 main.go:230: Number of items in rigJobQueue: 0
TRACE: 2018/07/02 19:27:20 main.go:231: Number of items in powerJobQueue: 0
TRACE: 2018/07/02 19:27:20 main.go:232: Number of items in poolJobQueue: 0
TRACE: 2018/07/02 19:27:20 main.go:233: Number of items in marketJobQueue: 1
TRACE: 2018/07/02 19:27:20 main.go:234: Number of items in rulesManagerJobQueue: 0
TRACE: 2018/07/02 19:27:20 main.go:235: Number of items in recordQueue(DB): 0
INFO: 2018/07/02 19:27:29 influxdb.go:46: DBDaemon ticker expired (30s). Data points (1) saved to influxDB
TRACE: 2018/07/02 19:27:40 functions.go:139: Alloc = 1 MiB TotalAlloc = 1 MiB Sys = 5 MiB NumGC = 0
TRACE: 2018/07/02 19:27:40 functions.go:147: Number of Go routines running: 22
TRACE: 2018/07/02 19:27:40 main.go:230: Number of items in rigJobQueue: 0
TRACE: 2018/07/02 19:27:40 main.go:231: Number of items in powerJobQueue: 0
TRACE: 2018/07/02 19:27:40 main.go:232: Number of items in poolJobQueue: 0
TRACE: 2018/07/02 19:27:40 main.go:233: Number of items in marketJobQueue: 1
TRACE: 2018/07/02 19:27:40 main.go:234: Number of items in rulesManagerJobQueue: 0
TRACE: 2018/07/02 19:27:40 main.go:235: Number of items in recordQueue(DB): 0
INFO: 2018/07/02 19:27:59 influxdb.go:46: DBDaemon ticker expired (30s). Data points (0) saved to influxDB
TRACE: 2018/07/02 19:28:00 functions.go:139: Alloc = 1 MiB TotalAlloc = 1 MiB Sys = 5 MiB NumGC = 0
TRACE: 2018/07/02 19:28:00 functions.go:147: Number of Go routines running: 22
TRACE: 2018/07/02 19:28:00 main.go:230: Number of items in rigJobQueue: 0
TRACE: 2018/07/02 19:28:00 main.go:231: Number of items in powerJobQueue: 0
TRACE: 2018/07/02 19:28:00 main.go:232: Number of items in poolJobQueue: 0
TRACE: 2018/07/02 19:28:00 main.go:233: Number of items in marketJobQueue: 1
TRACE: 2018/07/02 19:28:00 main.go:234: Number of items in rulesManagerJobQueue: 0
TRACE: 2018/07/02 19:28:00 main.go:235: Number of items in recordQueue(DB): 0
INFO: 2018/07/02 19:28:00 power-monitor.go:25: New power monitor job received: rig01
INFO: 2018/07/02 19:28:00 rig-monitor.go:21: New rig monitor job received: rig01
TRACE: 2018/07/02 19:28:00 rig-monitor.go:23: New rig monitor job received: &miner.RigConfig{RigName:"rig01", Miner:"ETHMINER", PoolLabel:"label_ethermine", PoolLabel2:"", URL:"//192.168.XXX.YYY:3333", InstalledGpus:8, TargetHashRate:183, TargetHashRate2:0, SmartPlugType:"", SmartPlugIP:"", PSUMaxPower:0, TargetTemperature:70}
TRACE: 2018/07/02 19:28:00 power-monitor.go:27: New power monitor job received: &power.SmartPlugConfig{RigName:"rig01", Miner:"ETHMINER", PoolLabel:"label_ethermine", PoolLabel2:"", SmartPlugType:"NOPLUG", SmartPlugIP:"", PSUMaxPower:640, RuleList:[]power.PowerMgmtRuleStruct(nil), LastReset:0}
TRACE: 2018/07/02 19:28:00 influxdb.go:54: New record received by DBDaemon: env_data map[label:label_ethermine plug_type:NOPLUG rig_id:rig01] map[max_power:640 power_usage:640] <nil> 2018-07-02 19:28:00.483485491 +0200 CEST m=+61.031484531
INFO: 2018/07/02 19:28:00 rig-monitor.go:82: Connection to rig rig01 OK.
Do you get an ERROR message after the "INFO: 2018/07/02 19:28:00 rig-monitor.go:82: Connection to rig rig01 OK." ?
No, - I'm not getting any error
See in the trace at 19:27
INFO: 2018/07/02 19:27:00 rig-monitor.go:82: Connection to rig rig01 OK.
Ok, so I assume there's no timeout. Right now I have coded the ethminer parser to wait for null (x00) character. Do you know what's the termination character included in ethminer's response? If you don't know then I'll setup my test rig tomorrow and debug it (currently I am using a simulator that serves the output from a text file)
I also made a tcpdump -n dst host 192.168.XXX.YYY
(where 192.168.XXX.YYY is my rig address) and nothing was logged - so it seems nothing getting sent to the rig.
Also there is no .... Api New api session from 192.168.XXX.AAA:....
within ethminer log - it seems you don't connect the rig.
OK. I will test it tomorrow and get back you asap
Ok, so I assume there's no timeout. Right now I have coded the ethminer parser to wait for null (x00) character. Do you know what's the termination character included in ethminer's response?
I think the json request must be terminated by a "\n". Have tested using following script and found that the response is also is terminated by a "\n" (0x0a)
#! /usr/bin/env python
import os, sys
import socket
HOST = sys.argv[1]
PORT = int(sys.argv[2]) # The same port as used by the server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.settimeout(10.0)
try:
s.connect((HOST, PORT))
a="{\"id\":0,\"jsonrpc\":\"2.0\",\"method\":\"miner_getstat1\}\n"
s.sendall(a)
data = s.recv(2048)
s.close()
open("resp.bin","wb").write(data)
except:
pass
sys.exit(0)
That was it. I modified the TCP call to accept a miner specified terminator. I'll include the fix with the 3.0 stable, which I'll release within the hour
Please close it once you have successfully tested it.
Thank you very much - it works - but: It seems as if grm can connect to the rig but does not get any respond (keeps connected) all the stats are getting wrong as (I just think) one rig data collector task keeps locked and waits for response.
So maybe you can add a timeout to get the responses.
Not sure if this case only addresses ethminer ?
damn. I only let it run for 1 round. Let me test it properly for a few minutes.
To Test: Write a small TCPIP Listener/Acceptor and do not respond to the request but keep connection alive on one of your rigs.
Thank you very much - it works - but: It seems as if grm can connect to the rig but does not get any respond (keeps connected) all the stats are getting wrong as (I just think) one rig data collector task keeps locked and waits for response.
@StefanOberhumer, are you gettin this error with all your miners running ethminer? I just tested it on my rig and I cannot replicate this problem. Is this a issue occurring because the miner is "freezing"? Sorry, but it's not clear to me when and how this issue is occurring.
I'm trying to reproduce....
Run following script (simulates a 8 card rig which freezes) and add it to grm conf
#! /usr/bin/env python
## vim:set ts=4 sw=4 et: -*- coding: utf-8 -*-
# simulate API response of a 8 card rig
import os, sys, time
import threading
import socket, json
class globs:
HOST = ''
PORT = 3333
error_after_x_requests = 2
threads = []
class RigApiThread(threading.Thread):
def __init__(self, ip, port, socket):
threading.Thread.__init__(self)
self.ip = ip
self.unprocessed_read = ""
self.port = port
self.socket = socket
self.do_run = True
def receive_till(self, till, max_buffer=500):
r = ""
p = -1
while p == -1 and self.do_run:
if len(r) >= max_buffer:
# seems we got some scrambled data - throw it away and try a fresh start
r = ""
r += self.socket.recv(1024)
if not r:
return r
p = r.find(till)
return r
def cancel(self):
self.do_run = False
try:
self.socket.close()
except:
pass
self.socket = None
def run(self):
while self.do_run:
request_s = self.receive_till("\n")
if not request_s:
break
try:
request = json.loads(request_s)
except:
continue # skip invalid requests
method = None
response = {}
try:
response["id"] = request["id"]
method = request["method"]
except:
continue # skip invalid requests (id and method are required)
print "==>%s" % request_s.replace("\n","\\n")
if method == "miner_getstat1":
result_strings = []
result_strings.append("ethminer-xx.yy") # version
result_strings.append("60") # runtime in minutes
result_strings.append("185128;6981;0") # hashrate ETH [kH/s] /shares/..
result_strings.append("22264;22990;23312;23232;23635;22990;23070;23635") # hashrate per gpu in kH/s
result_strings.append("0;0;0")
result_strings.append("off;off;off;off;off;off;off;off")
result_strings.append("63;98;63;79;63;43;63;62;63;42;63;73;63;84;63;52") # fan and temps
result_strings.append("testpool.xyz:1234")
result_strings.append("0;0;0;0")
response["jsonrpc"] = "2.0"
response["result"] = result_strings
response_s = json.dumps(response) + "\n"
globs.error_after_x_requests -= 1
if globs.error_after_x_requests > 0:
self.socket.send(response_s)
print "<==%s" % response_s.replace("\n","\\n")
else:
print "Simulating frozen miner !"
while self.do_run:
time.sleep(1) # now do nothing, keep connection open
return
try:
self.socket.close()
except:
pass
print "Connection closed"
def main(argv):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((globs.HOST, globs.PORT))
s.listen(1)
print "Listening on port %d" % globs.PORT
while True:
try:
(sock, (ip, port)) = s.accept()
print "Connected by %s:%d" % (ip, port)
newthread = RigApiThread(ip, port, sock)
globs.threads.append(newthread)
newthread.start()
except KeyboardInterrupt:
break
for t in globs.threads:
t.cancel()
for t in globs.threads:
t.join()
s.close()
if __name__ == "__main__":
sys.exit(main(sys.argv))
Thanks for this!!!! I will look into it tomorrow
As it seems ethminer is not working
configured as follows:
1.) I got no data in grafana 2.) I see no message in ethminer which looks like
i 18:41:13 Api New api session from 192.168.XXX.AAA:55940
So I think its not working
Reference to Issue #26