opendatahub-io / caikit-tgis-serving

Apache License 2.0
19 stars 44 forks source link

Synchronization issue when the model is just launched #170

Open kpouget opened 1 year ago

kpouget commented 1 year ago

Describe the bug

There is a synchronization issue at the launch of the Pod with the current images:

ERROR: Code: Internal Message: Unhandled exception during prediction ERROR: Code: Internal Message: Unhandled exception during prediction ERROR: Code: Internal Message: Unhandled exception during prediction ERROR: Code: Internal Message: Unhandled exception during prediction ERROR: Code: Internal Message: Unhandled exception during prediction ERROR: Code: Internal Message: Unhandled exception during prediction ERROR: Code: Internal Message: Unhandled exception during prediction { "generated_text": "74 degrees F.C., a temperature of 74 degrees F.C., a temperature of ", "generated_tokens": "25", "finish_reason": "MAX_TOKENS", "producer_id": { "name": "Text Generation", "version": "0.1.0" }, "input_token_count": "10" }


in the `transformer-container` logs, we can see this error:

{"channel": "GP-SERVICR-I", "exception": null, "level": "warning", "log_code": "", "message": "<_InactiveRpcError of RPC that terminated with: \tstatus = StatusCode.UNAVAILABLE \tdetails = \"failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:8033: Failed to connect to remote host: Connection refused\" \tdebug_error_string = \"UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:8033: Failed to connect to remote host: Connection refused {created_time:\"2023-10-24T11:48:51.016344787+00:00\", grpc_status:14}\"

", "model_id": "flan-t5-small-caikit", "num_indent": 0, "stack_trace": "Traceback (most recent call last): File \"/caikit/lib/python3.9/site-packages/caikit/runtime/servicers/global_predict_servicer.py\", line 283, in _handle_predict_exceptions yield File \"/caikit/lib/python3.9/site-packages/caikit/runtime/servicers/global_predict_servicer.py\", line 260, in predict_model response = work.do() File \"/caikit/lib/python3.9/site-packages/caikit/runtime/work_management/abortable_action.py\", line 118, in do return self.work_thread.get_or_throw() File \"/caikit/lib/python3.9/site-packages/caikit/core/toolkit/destroyable_thread.py\", line 188, in get_or_throw raise self.runnable_exception File \"/caikit/lib/python3.9/site-packages/caikit/core/toolkit/destroyable_thread.py\", line 124, in run self.runnable_result = self.runnable_func( File \"/caikit/lib/python3.9/site-packages/caikit_nlp/modules/text_generation/text_generation_tgis.py\", line 237, in run return self.tgis_generation_client.unary_generate( File \"/caikit/lib/python3.9/site-packages/caikit_nlp/toolkit/text_generation/tgis_utils.py\", line 315, in unary_generate batch_response = self.tgis_client.Generate(request) File \"/caikit/lib64/python3.9/site-packages/grpc/_channel.py\", line 1161, in call__ return _end_unary_response_blocking(state, call, False, None) File \"/caikit/lib64/python3.9/site-packages/grpc/_channel.py\", line 1004, in _end_unary_response_blocking raise _InactiveRpcError(state) # pytype: disable=not-instantiable grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: \tstatus = StatusCode.UNAVAILABLE \tdetails = \"failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:8033: Failed to connect to remote host: Connection refused\" \tdebug_error_string = \"UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:8033: Failed to connect to remote host: Connection refused {created_time:\"2023-10-24T11:48:51.016344787+00:00\", grpc_status:14}\"

", "thread_id": 140123215742720, "timestamp": "2023-10-24T11:48:51.017178"}

Platform

Sample Code

caikit_tgit_config.yaml.log inference_service.yaml.log serving_runtime.yaml.log

dtrifiro commented 1 year ago

Are there any errors in the tgis container (kserve-container)?

kpouget commented 1 year ago

no, no error message in this container

On Tue, Oct 24, 2023 at 2:10 PM Daniele @.***> wrote:

Are there any errors in the tgis container (kserve-container)?

— Reply to this email directly, view it on GitHub https://github.com/opendatahub-io/caikit-tgis-serving/issues/170, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZVQIVMURRBQ4OMBYDLASTYA6V3FAVCNFSM6AAAAAA6NSLDN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZXGA4DIOBTGI . You are receiving this because you authored the thread.Message ID: @.***>

dtrifiro commented 11 months ago

This will be fixed when https://github.com/opendatahub-io/caikit-tgis-serving/issues/156 is closed