ucbrise / clipper

A low-latency prediction-serving system
http://clipper.ai
Apache License 2.0
1.4k stars 280 forks source link

Tensorflow model container stops responding after predict function thrown exception #467

Open kbalka opened 6 years ago

kbalka commented 6 years ago

After passing incorrect input to predict function, I have to scale down and up the model in order to predict again.

I think the exceptions from predict function should be caught and logged, but they should not break the prediction loop.

Logs from model container:

Sent heartbeat! Received heartbeat! Sent heartbeat! Received heartbeat! Sent heartbeat! Received heartbeat! Sent heartbeat! Received heartbeat! Sent heartbeat! Received heartbeat! Got start of message 16 Traceback (most recent call last): File "/container/tf_container.py", line 110, in rpc_service.start(model, ip, port, model_name, model_version, input_type) File "/container/rpc.py", line 517, in start self.server.run(parent_conn) File "/container/rpc.py", line 309, in run prediction_request) File "/container/rpc.py", line 136, in handle_prediction_request outputs = predict_fn(prediction_request.inputs) File "/container/tf_container.py", line 52, in predict_floats preds = self.predict_func(self.sess, inputs) File "deploy_resnet.py", line 11, in predict File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 905, in run run_metadata_ptr) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1113, in _run str(subfeed_t.get_shape()))) ValueError: Cannot feed value of shape (1, 10) for Tensor u'input_tensor:0', which has shape '(1, 224, 224, 3)'

.. cannot predict anymore

dcrankshaw commented 6 years ago

Ahh this is an excellent point. Thanks for bringing this to our attention.

santi81 commented 6 years ago

@dcrankshaw do you think it makes sense to actually link it with readiness probes

rkooo567 commented 5 years ago

@simon-mo Has it been addressed?