jina-ai / serve

☁️ Build multimodal AI applications with cloud-native stack
https://jina.ai/serve
Apache License 2.0
21.14k stars 2.22k forks source link

`StableLM` example from the homepage doesn't work properly. #6127

Closed codetalker7 closed 11 months ago

codetalker7 commented 11 months ago

I was going through the small example on the homepage of the docs, and it gives me a weird error:

WARNI… gateway@6246 Getting endpoints failed: failed to connect to all           [12/09/23 07:58:35]
       addresses. Waiting for another trial                                                         
WARNI… gateway@6246 Getting endpoints failed: failed to connect to all           [12/09/23 07:59:16]
       addresses. Waiting for another trial                                                         
WARNI… gateway@6246 Getting endpoints failed: failed to connect to all           [12/09/23 08:03:15]
       addresses. Waiting for another trial                                                         
WARNI… gateway@6166 <jina.orchestrate.pods.Pod object at 0x7e03081072e0> timeout [12/09/23 08:08:30]
       after waiting for 600000ms, if your executor takes time to load, you may                     
       increase --timeout-ready                                                                     
WARNI… gateway@6246 Getting endpoints failed: failed to connect to all           [12/09/23 08:11:47]
       addresses. Waiting for another trial                                                         
INFO   gateway@6246 start server bound to 0.0.0.0:12345                          [12/09/23 08:11:48]
Traceback (most recent call last):
  File "/content/deployment.py", line 6, in <module>
    with dep:
  File "/usr/local/lib/python3.10/dist-packages/jina/orchestrate/orchestrator.py", line 14, in __enter__
    return self.start()
  File "/usr/local/lib/python3.10/dist-packages/jina/orchestrate/deployments/__init__.py", line 1157, in start
    self._wait_until_all_ready()
  File "/usr/local/lib/python3.10/dist-packages/jina/orchestrate/deployments/__init__.py", line 1095, in _wait_until_all_ready
    asyncio.get_event_loop().run_until_complete(wait_for_ready_coro)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/dist-packages/jina/orchestrate/deployments/__init__.py", line 1212, in async_wait_start_success
    await asyncio.gather(*coros)
  File "/usr/local/lib/python3.10/dist-packages/jina/orchestrate/pods/__init__.py", line 221, in async_wait_start_success
    self._fail_start_timeout(_timeout)
  File "/usr/local/lib/python3.10/dist-packages/jina/orchestrate/pods/__init__.py", line 140, in _fail_start_timeout
    raise TimeoutError(
TimeoutError: jina.orchestrate.pods.Pod:gateway can not be initialized after 600000.0ms

Just for reference, here's the code to the executor.py and the deployment.py scripts:

executor.py:

from jina import Executor, requests
from docarray import DocList, BaseDoc

from transformers import pipeline

class Prompt(BaseDoc):
    text: str

class Generation(BaseDoc):
    prompt: str
    text: str

class StableLM(Executor):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.generator = pipeline(
            'text-generation', model='stabilityai/stablelm-base-alpha-3b'
        )

    @requests
    def generate(self, docs: DocList[Prompt], **kwargs) -> DocList[Generation]:
        generations = DocList[Generation]()
        prompts = docs.text
        llm_outputs = self.generator(prompts)
        for prompt, output in zip(prompts, llm_outputs):
            generations.append(Generation(prompt=prompt, text=output))
        return generations

deployment.py:

from jina import Deployment
from executor import StableLM

dep = Deployment(uses=StableLM, timeout_ready=-1, port=12345)

with dep:
    dep.block()

And I'm running the deployment script simply by doing:

python3 deployment.py

Am I missing something or does this example need to be updated?

JoanFM commented 11 months ago

It seems that your model is being downloaded. Can you run in a different script this and then try agaian?

from transformers import pipeline
pipeline(
            'text-generation', model='stabilityai/stablelm-base-alpha-3b'
        )
codetalker7 commented 11 months ago

It seems that your model is being downloaded. Can you run in a different script this and then try agaian?

from transformers import pipeline
pipeline(
            'text-generation', model='stabilityai/stablelm-base-alpha-3b'
        )

Hi @JoanFM. I tried running the new script, and it downloaded the model just fine. But the same error still persists.

Here is the output of the download script:

config.json: 100%
708/708 [00:00<00:00, 35.6kB/s]
pytorch_model.bin.index.json: 100%
21.1k/21.1k [00:00<00:00, 1.07MB/s]
Downloading shards: 100%
2/2 [05:30<00:00, 154.68s/it]
pytorch_model-00001-of-00002.bin: 100%
10.2G/10.2G [03:45<00:00, 44.1MB/s]
pytorch_model-00002-of-00002.bin: 100%
4.66G/4.66G [01:44<00:00, 41.3MB/s]

And here is the output of python3 deployment.py, which again leads to the same error:

WARNI… gateway@2969 Getting endpoints failed: failed to connect to all           [12/09/23 09:28:00]
       addresses. Waiting for another trial                                                         
WARNI… gateway@2969 Getting endpoints failed: failed to connect to all           [12/09/23 09:28:39]
       addresses. Waiting for another trial                                                         
WARNI… gateway@2969 Getting endpoints failed: failed to connect to all           [12/09/23 09:33:11]
       addresses. Waiting for another trial                                                         
WARNI… gateway@2889 <jina.orchestrate.pods.Pod object at 0x7e068c21bfa0> timeout [12/09/23 09:37:55]
       after waiting for 600000ms, if your executor takes time to load, you may                     
       increase --timeout-ready                                                                     
WARNI… gateway@2969 Getting endpoints failed: failed to connect to all           [12/09/23 09:41:08]
       addresses. Waiting for another trial                                                         
INFO   gateway@2969 start server bound to 0.0.0.0:12345                          [12/09/23 09:41:09]
Traceback (most recent call last):
  File "/content/deployment.py", line 6, in <module>
    with dep:
  File "/usr/local/lib/python3.10/dist-packages/jina/orchestrate/orchestrator.py", line 14, in __enter__
    return self.start()
  File "/usr/local/lib/python3.10/dist-packages/jina/orchestrate/deployments/__init__.py", line 1157, in start
    self._wait_until_all_ready()
  File "/usr/local/lib/python3.10/dist-packages/jina/orchestrate/deployments/__init__.py", line 1095, in _wait_until_all_ready
    asyncio.get_event_loop().run_until_complete(wait_for_ready_coro)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/dist-packages/jina/orchestrate/deployments/__init__.py", line 1212, in async_wait_start_success
    await asyncio.gather(*coros)
  File "/usr/local/lib/python3.10/dist-packages/jina/orchestrate/pods/__init__.py", line 221, in async_wait_start_success
    self._fail_start_timeout(_timeout)
  File "/usr/local/lib/python3.10/dist-packages/jina/orchestrate/pods/__init__.py", line 140, in _fail_start_timeout
    raise TimeoutError(
TimeoutError: jina.orchestrate.pods.Pod:gateway can not be initialized after 600000.0ms
JoanFM commented 11 months ago

what is the Jina and docarray version that you have installed?

codetalker7 commented 11 months ago

what is the Jina and docarray version that you have installed?

@JoanFM, I just installed jina from pip, so should be the most recent PyPI version. Here's the output of python3 -m pip show jina docarray:

Name: jina
Version: 3.23.1
Summary: Multimodal AI services & pipelines with cloud-native stack: gRPC, Kubernetes, Docker, OpenTelemetry, Prometheus, Jaeger, etc.
Home-page: https://github.com/jina-ai/jina/
Author: Jina AI
Author-email: [hello@jina.ai](mailto:hello@jina.ai)
License: Apache 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: aiofiles, aiohttp, docarray, docker, fastapi, filelock, grpcio, grpcio-health-checking, grpcio-reflection, jcloud, jina-hubble-sdk, numpy, opentelemetry-api, opentelemetry-exporter-otlp, opentelemetry-exporter-otlp-proto-grpc, opentelemetry-exporter-prometheus, opentelemetry-instrumentation-aiohttp-client, opentelemetry-instrumentation-fastapi, opentelemetry-instrumentation-grpc, opentelemetry-sdk, packaging, pathspec, prometheus-client, protobuf, pydantic, python-multipart, pyyaml, requests, urllib3, uvicorn, uvloop, websockets
Required-by: 
---
Name: docarray
Version: 0.39.1
Summary: The data structure for multimodal data
Home-page: https://docs.docarray.org/
Author: DocArray
Author-email: 
License: Apache 2.0
Location: /usr/local/lib/python3.10/dist-packages
Requires: numpy, orjson, pydantic, rich, types-requests, typing-inspect
Required-by: jina
JoanFM commented 11 months ago

can u run with JINA_LOG_LEVEL=DEBUG environment variable?

codetalker7 commented 11 months ago

can u run with JINA_LOG_LEVEL=DEBUG environment variable?

Hi @JoanFM, sure, here's the output of JINA_LOG_LEVEL=DEBUG python3 -m deployment

DEBUG  executor-replica-set@136297 Waiting for ReplicaSet to start successfully                                                                            [12/10/23 18:15:02]
DEBUG  executor/rep-0@136310 Setting signal handlers                                                                                                       [12/10/23 18:15:02]
DEBUG  executor/rep-0@136310 Signal handlers already set
DEBUG  gateway@136311 Setting signal handlers                                                                                                              [12/10/23 18:15:02]
DEBUG  gateway@136311 Signal handlers already set
DEBUG  gateway@136311 adding connection for deployment executor/heads/0 to grpc://0.0.0.0:63378                                                            [12/10/23 18:15:02]
DEBUG  gateway@136311 create_connection connection for executor to grpc://0.0.0.0:63378
DEBUG  gateway@136311 create_connection connection for executor to grpc://0.0.0.0:63378
DEBUG  gateway@136311 connection for deployment executor/heads/0 to grpc://0.0.0.0:63378 added
DEBUG  gateway@136311 Setting up GRPC server
DEBUG  gateway@136311 Get all endpoints from TopologyGraph
DEBUG  gateway@136311 Getting Endpoints data from executor
DEBUG  gateway@136311 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702212302.342733487","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702212302.342731834…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 1th time.
DEBUG  gateway@136311 resetting connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 create_connection connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:                            [12/10/23 18:15:03]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702212303.342263328","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702212303.342262457…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 2th time.
DEBUG  gateway@136311 resetting connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 create_connection connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:                            [12/10/23 18:15:05]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702212305.177361743","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702212305.177360250…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 3th time.
DEBUG  gateway@136311 resetting connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 create_connection connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:                            [12/10/23 18:15:07]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702212307.272224650","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702212307.272223268…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 4th time.
DEBUG  gateway@136311 gRPC call for executor failed, retries exhausted
DEBUG  gateway@136311 resetting connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 create_connection connection for executor to 0.0.0.0:63378
WARNI… gateway@136311 Getting endpoints failed: failed to connect to all addresses. Waiting for another trial
DEBUG  gateway@136311 Getting Endpoints data from executor                                                                                                 [12/10/23 18:15:08]
DEBUG  gateway@134873 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:                            [12/10/23 18:15:08]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702212308.594345522","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702212308.594344169…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 2th time.
DEBUG  gateway@134873 resetting connection for executor to 0.0.0.0:58784
DEBUG  gateway@134873 create_connection connection for executor to 0.0.0.0:58784
DEBUG  gateway@136311 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:                            [12/10/23 18:15:11]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702212311.258205136","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702212311.258203573…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 1th time.
DEBUG  gateway@136311 resetting connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 create_connection connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:                            [12/10/23 18:15:17]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702212317.184591800","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702212317.184590177…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 2th time.
DEBUG  gateway@136311 resetting connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 create_connection connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:                            [12/10/23 18:15:26]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702212326.060864501","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702212326.060863649…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 3th time.
DEBUG  gateway@136311 resetting connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 create_connection connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:                            [12/10/23 18:15:40]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702212340.367356535","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702212340.367354442…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 4th time.
DEBUG  gateway@136311 gRPC call for executor failed, retries exhausted
DEBUG  gateway@136311 resetting connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 create_connection connection for executor to 0.0.0.0:63378
WARNI… gateway@136311 Getting endpoints failed: failed to connect to all addresses. Waiting for another trial
DEBUG  gateway@136311 Getting Endpoints data from executor                                                                                                 [12/10/23 18:15:41]
DEBUG  gateway@136311 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:                            [12/10/23 18:16:10]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702212370.642886827","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702212370.642885264…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 1th time.
DEBUG  gateway@136311 resetting connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 create_connection connection for executor to 0.0.0.0:63378
DEBUG  gateway@134873 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:                            [12/10/23 18:16:13]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702212373.436257502","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702212373.436255969…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 3th time.
DEBUG  gateway@134873 resetting connection for executor to 0.0.0.0:58784
DEBUG  gateway@134873 create_connection connection for executor to 0.0.0.0:58784
DEBUG  gateway@136311 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:                            [12/10/23 18:16:58]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702212418.856391611","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702212418.856390118…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 2th time.
DEBUG  gateway@136311 resetting connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 create_connection connection for executor to 0.0.0.0:63378
DEBUG  gateway@134873 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:                            [12/10/23 18:17:42]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702212462.414183164","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702212462.414181501…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 4th time.
DEBUG  gateway@134873 gRPC call for executor failed, retries exhausted
DEBUG  gateway@134873 resetting connection for executor to 0.0.0.0:58784
DEBUG  gateway@134873 create_connection connection for executor to 0.0.0.0:58784
WARNI… gateway@134873 Getting endpoints failed: failed to connect to all addresses. Waiting for another trial
DEBUG  gateway@134873 cancel get all endpoints                                                                                                             [12/10/23 18:17:43]
DEBUG  gateway@134873 Got all endpoints from TopologyGraph None
INFO   gateway@134873 start server bound to 0.0.0.0:12345
DEBUG  gateway@134873 server bound to 0.0.0.0:12345 started
DEBUG  gateway@134873 GRPC server setup successful
DEBUG  gateway@134873 process terminated
DEBUG  gateway@136311 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:                            [12/10/23 18:18:15]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702212495.708559314","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702212495.708557841…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 3th time.
DEBUG  gateway@136311 resetting connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 create_connection connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:                            [12/10/23 18:19:57]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702212597.124311780","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702212597.124309956…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 4th time.
DEBUG  gateway@136311 gRPC call for executor failed, retries exhausted
DEBUG  gateway@136311 resetting connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 create_connection connection for executor to 0.0.0.0:63378
WARNI… gateway@136311 Getting endpoints failed: failed to connect to all addresses. Waiting for another trial
DEBUG  gateway@136311 Getting Endpoints data from executor                                                                                                 [12/10/23 18:19:58]
DEBUG  gateway@136311 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:                            [12/10/23 18:22:13]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702212733.848546614","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702212733.848544320…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 1th time.
DEBUG  gateway@136311 resetting connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 create_connection connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:                            [12/10/23 18:24:08]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702212848.345446446","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702212848.345445024…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 2th time.
DEBUG  gateway@136311 resetting connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 create_connection connection for executor to 0.0.0.0:63378
WARNI… gateway@136297 <jina.orchestrate.pods.Pod object at 0x7f1e507a2790> timeout after waiting for 600000ms, if your executor takes time to load, you    [12/10/23 18:25:02]
       may increase --timeout-ready
DEBUG  gateway@136297 waiting for ready or shutdown signal from runtime
DEBUG  gateway@136297 Runtime was never started. Runtime will end gracefully on its own
DEBUG  gateway@136297 terminating the runtime process
DEBUG  gateway@136297 runtime process properly terminated
DEBUG  gateway@136297 terminated
DEBUG  gateway@136311 Received signal SIGTERM                                                                                                              [12/10/23 18:25:02]
DEBUG  gateway@136297 waiting for ready or shutdown signal from runtime
DEBUG  gateway@136297 shutdown is already set. Runtime will end gracefully on its own
DEBUG  gateway@136297 terminating the runtime process
DEBUG  gateway@136297 runtime process properly terminated
DEBUG  gateway@136297 terminated
DEBUG  executor/rep-0@136297 waiting for ready or shutdown signal from runtime                                                                             [12/10/23 18:25:02]
DEBUG  gateway@136311 Received signal SIGTERM
DEBUG  executor/rep-0@136297 Runtime was never started. Runtime will end gracefully on its own
DEBUG  executor/rep-0@136297 terminating the runtime process
DEBUG  executor/rep-0@136297 runtime process properly terminated
DEBUG  executor/rep-0@136297 terminated
DEBUG  executor/rep-0@136297 joining the process
DEBUG  executor/rep-0@136297 successfully joined the process
DEBUG  gateway@136297 joining the process
DEBUG  gateway@136311 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:                            [12/10/23 18:25:50]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702212950.244549389","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702212950.244547996…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 3th time.
DEBUG  gateway@136311 resetting connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 create_connection connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 gRPC call to executor for EndpointDiscovery errored, with error <AioRpcError of RPC that terminated with:                            [12/10/23 18:27:29]
               status = StatusCode.UNAVAILABLE
               details = "failed to connect to all addresses"
               debug_error_string = "{"created":"@1702213049.752896158","description":"Failed to pick
       subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3260,"referenced_errors":[{"created":"@1702213049.752894715…
       to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}]}"
       > and for the 4th time.
DEBUG  gateway@136311 gRPC call for executor failed, retries exhausted
DEBUG  gateway@136311 resetting connection for executor to 0.0.0.0:63378
DEBUG  gateway@136311 create_connection connection for executor to 0.0.0.0:63378
WARNI… gateway@136311 Getting endpoints failed: failed to connect to all addresses. Waiting for another trial
DEBUG  gateway@136311 cancel get all endpoints                                                                                                             [12/10/23 18:27:30]
DEBUG  gateway@136311 Got all endpoints from TopologyGraph None
INFO   gateway@136311 start server bound to 0.0.0.0:12345
DEBUG  gateway@136311 server bound to 0.0.0.0:12345 started
DEBUG  gateway@136311 GRPC server setup successful
DEBUG  gateway@136311 process terminated
DEBUG  gateway@136297 successfully joined the process                                                                                                      [12/10/23 18:27:30]
DEBUG  gateway@136297 joining the process
DEBUG  gateway@136297 successfully joined the process
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/codetalker7/jinaAI/getting_started/deployment.py", line 6, in <module>
    with dep:
  File "/home/codetalker7/jinaAI/venv/lib/python3.8/site-packages/jina/orchestrate/orchestrator.py", line 14, in __enter__
    return self.start()
  File "/home/codetalker7/jinaAI/venv/lib/python3.8/site-packages/jina/orchestrate/deployments/__init__.py", line 1157, in start
    self._wait_until_all_ready()
  File "/home/codetalker7/jinaAI/venv/lib/python3.8/site-packages/jina/orchestrate/deployments/__init__.py", line 1095, in _wait_until_all_ready
    asyncio.get_event_loop().run_until_complete(wait_for_ready_coro)
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/home/codetalker7/jinaAI/venv/lib/python3.8/site-packages/jina/orchestrate/deployments/__init__.py", line 1212, in async_wait_start_success
    await asyncio.gather(*coros)
  File "/home/codetalker7/jinaAI/venv/lib/python3.8/site-packages/jina/orchestrate/pods/__init__.py", line 221, in async_wait_start_success
    self._fail_start_timeout(_timeout)
  File "/home/codetalker7/jinaAI/venv/lib/python3.8/site-packages/jina/orchestrate/pods/__init__.py", line 140, in _fail_start_timeout
    raise TimeoutError(
TimeoutError: jina.orchestrate.pods.Pod:gateway can not be initialized after 600000.0ms
JoanFM commented 11 months ago

just to check, can u try moving the import from transformers to inside the init method of the Executor?

codetalker7 commented 11 months ago

just to check, can u try moving the import from transformers to inside the init method of the Executor?

@JoanFM tried this out, but it gives me the same error. Just to be sure, here is the new code for the Executor:

from jina import Executor, requests
from docarray import DocList, BaseDoc

class Prompt(BaseDoc):
    text: str

class Generation(BaseDoc):
    prompt: str
    text: str

class StableLM(Executor):
    def __init__(self, **kwargs):
        from transformers import pipeline
        super().__init__(**kwargs)
        self.generator = pipeline(
            'text-generation', model='stabilityai/stablelm-base-alpha-3b'
        )

    @requests
    def generate(self, docs: DocList[Prompt], **kwargs) -> DocList[Generation]:
        generations = DocList[Generation]()
        prompts = docs.text
        llm_outputs = self.generator(prompts)
        for prompt, output in zip(prompts, llm_outputs):
            generations.append(Generation(prompt=prompt, text=output))
        return generations

But it still gives me the same error as before.

JoanFM commented 11 months ago

what is the transformers library version you are using?

codetalker7 commented 11 months ago

what is the transformers library version you are using?

@JoanFM, installed transformers from PyPI, so should be the latest version from there. Here's the version:

Name: transformers
Version: 4.35.2
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /home/codetalker7/jinaAI/venv/lib/python3.8/site-packages
Requires: tokenizers, huggingface-hub, pyyaml, tqdm, packaging, regex, filelock, safetensors, numpy, requests
JoanFM commented 11 months ago

do you have torch or tensorflow installed?

codetalker7 commented 11 months ago

do you have torch or transformers installed?

Yes, I have installed them both. torch version 2.1.1.

JoanFM commented 11 months ago

Ok, I think what is happening is that you may not have enough memory, and your OS has killed the Executor service.

Can you try:

when this works, then I believe the example would work.

codetalker7 commented 11 months ago
  • rm -rf ~/.cache/huggingface/hub/models--stabilityai--stablelm-base-alpha-3

@JoanFM yes, I think memory was the issue. I tried gpt2 instead of stablelm and it works out just fine. Thanks a lot for the help!