nginyc / rafiki

Rafiki is a distributed system that supports training and deployment of machine learning models using AutoML, built with ease-of-use in mind.
Apache License 2.0
36 stars 23 forks source link

Errors might be related to Kafka #169

Open wild-flame opened 5 years ago

wild-flame commented 5 years ago

Sometime when doing inference job (prediction) about TfVGG model, we would encounter the error

service-0d85583a-1488-4ef8-8fc2-ebb572e1f8e5-worker-e3fdaf8e8c79.log

2019-09-02 06:44:40,393 rafiki.utils.service INFO Starting worker "e3fdaf8e8c79" for service of ID "0d85583a-1488-4ef8-8fc2-ebb572e1f8e5"...
2019-09-02 06:44:40,393 __main__ INFO Starting global predictor...
2019-09-02 06:44:40,553 rafiki.predictor.predictor INFO Reading job info from meta store...
2019-09-02 06:44:40,571 rafiki.predictor.predictor INFO Using ensemble method: <function ensemble_probabilities at 0x7fe162a3a950>...
2019-09-02 06:44:40,579 rafiki.redis.redis INFO Connecting to Redis at namespace INFERENCE:b872bba9-e391-4f8d-8abc-1fdce6fb1cd5...
2019-09-02 06:44:40,583 kafka.conn INFO <BrokerConnection node_id=bootstrap-0 host=rafiki_kafka:9092 <connecting> [IPv4 ('10.0.0.149', 9092)]>: connecting to rafiki_kafka:9092 [('10.0.0.149', 9092) IPv4]
2019-09-02 06:44:40,583 kafka.conn INFO Probing node bootstrap-0 broker version
2019-09-02 06:44:40,592 kafka.conn INFO <BrokerConnection node_id=bootstrap-0 host=rafiki_kafka:9092 <connecting> [IPv4 ('10.0.0.149', 9092)]>: Connection complete.
2019-09-02 06:44:40,702 kafka.conn INFO Broker version identifed as 1.0.0
2019-09-02 06:44:40,702 kafka.conn INFO Set configuration api_version=(1, 0, 0) to skip auto check_version requests on startup
2019-09-02 06:44:40,704 rafiki.predictor.predictor INFO Initialized predictor for inference job "b872bba9-e391-4f8d-8abc-1fdce6fb1cd5"
2019-09-02 06:44:41,254 werkzeug INFO  * Running on http://0.0.0.0:3003/ (Press CTRL+C to quit)
2019-09-02 06:44:45,502 rafiki.utils.service WARNING Terminal signal received: 15, <frame object at 0x55556c4070c8>
2019-09-02 06:44:45,895 kafka.producer.kafka INFO Closing the Kafka producer with 9223372036.0 secs timeout.
2019-09-02 06:44:45,896 kafka.conn INFO <BrokerConnection node_id=bootstrap-0 host=rafiki_kafka:9092 <connected> [IPv4 ('10.0.0.149', 9092)]>: Closing connection. 
2019-09-02 06:44:45,896 kafka.producer.kafka INFO Kafka producer closed

service-7d986259-a0a0-498e-9777-cf0cf5dd78cc-worker-d6b15c22403c.log

2019-08-31 08:56:18,634 rafiki.utils.service INFO Starting worker "d6b15c22403c" for service of ID "7d986259-a0a0-498e-9777-cf0cf5dd78cc"...
2019-08-31 08:56:18,634 __main__ INFO Starting global predictor...
2019-08-31 08:56:18,746 rafiki.predictor.predictor INFO Reading job info from meta store...
2019-08-31 08:56:18,771 rafiki.predictor.predictor INFO Using ensemble method: <function ensemble_probabilities at 0x7f9a01f0f950>...
2019-08-31 08:56:18,784 rafiki.redis.redis INFO Connecting to Redis at namespace INFERENCE:23420b87-ecf5-4e50-b929-935ea64a4e68...
2019-08-31 08:56:18,787 kafka.conn INFO <BrokerConnection node_id=bootstrap-0 host=rafiki_kafka:9092 <connecting> [IPv4 ('10.0.0.149', 9092)]>: connecting to rafiki_kafka:9092 [('10.0.0.149', 9092) IPv4]
2019-08-31 08:56:18,788 kafka.conn INFO Probing node bootstrap-0 broker version
2019-08-31 08:56:18,791 kafka.conn INFO <BrokerConnection node_id=bootstrap-0 host=rafiki_kafka:9092 <connecting> [IPv4 ('10.0.0.149', 9092)]>: Connection complete.
2019-08-31 08:56:18,900 kafka.conn INFO Broker version identifed as 1.0.0
2019-08-31 08:56:18,900 kafka.conn INFO Set configuration api_version=(1, 0, 0) to skip auto check_version requests on startup
2019-08-31 08:56:18,903 rafiki.predictor.predictor INFO Initialized predictor for inference job "23420b87-ecf5-4e50-b929-935ea64a4e68"
2019-08-31 08:56:19,431 werkzeug INFO  * Running on http://0.0.0.0:3003/ (Press CTRL+C to quit)
2019-08-31 09:01:18,993 kafka.conn INFO <BrokerConnection node_id=1001 host=rafiki_kafka:9092 <connecting> [IPv4 ('10.0.0.149', 9092)]>: connecting to rafiki_kafka:9092 [('10.0.0.149', 9092) IPv4]
2019-08-31 09:01:18,994 kafka.conn INFO <BrokerConnection node_id=1001 host=rafiki_kafka:9092 <connecting> [IPv4 ('10.0.0.149', 9092)]>: Connection complete.
2019-08-31 09:01:18,994 kafka.conn INFO <BrokerConnection node_id=bootstrap-0 host=rafiki_kafka:9092 <connected> [IPv4 ('10.0.0.149', 9092)]>: Closing connection. 
2019-08-31 09:50:55,883 rafiki.utils.service WARNING Terminal signal received: 15, <frame object at 0x5638549279c8>
2019-08-31 09:50:56,329 kafka.producer.kafka INFO Closing the Kafka producer with 9223372036.0 secs timeout.
2019-08-31 09:50:56,330 kafka.conn INFO <BrokerConnection node_id=1001 host=rafiki_kafka:9092 <connected> [IPv4 ('10.0.0.149', 9092)]>: Closing connection. 
2019-08-31 09:50:56,332 kafka.producer.kafka INFO Kafka producer closed

service-a2770390-0379-47b0-bee6-35f0b373f83b-worker-fbe5905b2fdb.log

2019-08-31 09:50:45,566 kafka.coordinator INFO Stopping heartbeat thread