regel / loudml

Loud ML is the first open-source AI solution for ICT and IoT automation
Other
298 stars 93 forks source link

[Feature Request] Prepend Timestamp and JobID to the output logs. #387

Open toni-moreno opened 4 years ago

toni-moreno commented 4 years ago

Hello @regel .

Could be great if output log lines had timestamp to see when the output happened.

Y have deployed loudml in a swarm and has been restarted several times in the last hours.

$ docker service logs   test-loudml_loudml 2>&1 | grep -i 8077
test-loudml_loudml.1.16k9yzo4l85m@worker3    | INFO:root:starting Loud ML server on 0.0.0.0:8077
test-loudml_loudml.1.34x1qmyeemmc@worker3    | INFO:root:starting Loud ML server on 0.0.0.0:8077
test-loudml_loudml.1.co1yypn2f0pw@worker3    | INFO:root:starting Loud ML server on 0.0.0.0:8077
test-loudml_loudml.1.9s3b9crj4w32@worker3    | INFO:root:starting Loud ML server on 0.0.0.0:8077

output log doesn't give us information about "when the reboot happened", in this context is difficult correlate loudml events with external problems in the platform.

Thank you very much.

toni-moreno commented 4 years ago

Also could be great if each log line could have the jobid which has generated , so we can see errors/stacktraces and identificate which process is being crashing.

In the following lines there is two jobs , one training and predicting the other, no way to know which one is the responsible for the stack trace error for people not knowing the code.

INFO:root:job[9d29c201-4ad1-4d4b-9139-790a011591fd] starting, nice=5
INFO:root:job[b8cf6a90-9726-45f2-b004-5114599b60aa] starting, nice=0
INFO:root:connecting to influxdb on influxdb:8086, using database 'loudml'
INFO:root:predict(swarm@cpu@95percentile@usage_active@host_worker2_cpu_cpu-total@time@5m) range=2020-08-06T12:40:00.000Z-2020-08-06T12:45:00.000Z
INFO:root:train(swarm@cpu@mean@usage_active@host_worker2_cpu_cpu-total@time@5m) range=2020-08-05T12:40:00.000Z-2020-08-06T12:45:00.000Z train_size=0.670000 batch_size=64 epochs=100)
INFO:root:connecting to influxdb on influxdb:8086, using database 'swarm'
INFO:root:missing data: field 'usage_active', metric 'mean', bucket: 2020-08-05T12:40:00Z
INFO:root:missing data: field 'usage_active', metric 'mean', bucket: 2020-08-05T19:15:00Z
INFO:root:found 289 time periods
INFO:hyperopt.tpe:tpe_transform took 0.004817 seconds
INFO:hyperopt.tpe:TPE using 0 trials
WARNING:root:iteration failed: insufficient validation data
INFO:hyperopt.tpe:tpe_transform took 0.003048 seconds
INFO:hyperopt.tpe:TPE using 1/1 trials with best loss inf
WARNING:root:iteration failed: insufficient validation data
INFO:hyperopt.tpe:tpe_transform took 0.002605 seconds
INFO:hyperopt.tpe:TPE using 2/2 trials with best loss inf
WARNING:root:iteration failed: insufficient validation data
INFO:hyperopt.tpe:tpe_transform took 0.002599 seconds
INFO:hyperopt.tpe:TPE using 3/3 trials with best loss inf
WARNING:root:iteration failed: insufficient validation data
INFO:hyperopt.tpe:tpe_transform took 0.002649 seconds
INFO:hyperopt.tpe:TPE using 4/4 trials with best loss inf
WARNING:root:iteration failed: insufficient validation data
INFO:hyperopt.tpe:tpe_transform took 0.002580 seconds
INFO:hyperopt.tpe:TPE using 5/5 trials with best loss inf
WARNING:root:iteration failed: insufficient validation data
INFO:hyperopt.tpe:tpe_transform took 0.002577 seconds
INFO:hyperopt.tpe:TPE using 6/6 trials with best loss inf
WARNING:root:iteration failed: insufficient validation data
INFO:hyperopt.tpe:tpe_transform took 0.002516 seconds
INFO:hyperopt.tpe:TPE using 7/7 trials with best loss inf
WARNING:root:iteration failed: insufficient validation data
INFO:hyperopt.tpe:tpe_transform took 0.002593 seconds
INFO:hyperopt.tpe:TPE using 8/8 trials with best loss inf
WARNING:root:iteration failed: insufficient validation data
INFO:hyperopt.tpe:tpe_transform took 0.002615 seconds
INFO:hyperopt.tpe:TPE using 9/9 trials with best loss inf
WARNING:root:iteration failed: insufficient validation data
ERROR:root:
Traceback (most recent call last):
  File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 53, in run
    res = getattr(self, func_name)(*args, **kwargs)
  File "/opt/venv/lib/python3.7/site-packages/loudml/worker.py", line 101, in train
    **kwargs
  File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 1091, in train
    abnormal=abnormal,
  File "/opt/venv/lib/python3.7/site-packages/loudml/donut.py", line 843, in _train_on_dataset
    rstate=fmin_state,
  File "/opt/venv/lib/python3.7/site-packages/hyperopt/fmin.py", line 403, in fmin
    show_progressbar=show_progressbar,
  File "/opt/venv/lib/python3.7/site-packages/hyperopt/base.py", line 651, in fmin
    show_progressbar=show_progressbar)
  File "/opt/venv/lib/python3.7/site-packages/hyperopt/fmin.py", line 426, in fmin
    return trials.argmin
  File "/opt/venv/lib/python3.7/site-packages/hyperopt/base.py", line 600, in argmin
    best_trial = self.best_trial
  File "/opt/venv/lib/python3.7/site-packages/hyperopt/base.py", line 591, in best_trial
    raise AllTrialsFailed
hyperopt.exceptions.AllTrialsFailed