jina-ai / serve

☁️ Build multimodal AI applications with cloud-native stack
Apache License 2.0
21.13k stars 2.22k forks source link

Gateway error: received an empty stream from the client #4593

Closed Jackal1586 closed 2 years ago

Jackal1586 commented 2 years ago


I encountered an error in flow.post method while trying to run python app.py -t index in multires-lyrics-search after cloning this repository: examples. The error message was: gateway@61291[E]:receive an empty stream from the client! please check your client's inputs, you can use "Client.check_input(inputs)".


JoanFM commented 2 years ago

My guess is that the paths are not well set and then nothing is being sent through the client

Jackal1586 commented 2 years ago

Can you please specify which paths you meant here?

hanxiao commented 2 years ago

jina is probably too old?

Jackal1586 commented 2 years ago

I tried this one:


Error: ModuleNotFoundError: Dependencies listed in requirements.txt are not all installed locally, this Executor may not run as expect. To install dependencies, add --install-requirements or set install_requirements = True

My modified requirements.txt

click==8.0.1 jina[standard]~=2.0 kaggle==1.5.12 docker git+https://github.com/jina-ai/jina-commons@v0.0.6

Couldn't try jina~=3.0 as it didn't satisfy jina-commons's requirement.

Am I missing something here?

JoanFM commented 2 years ago

Hey @Jackal1586 ,

As u see in the requirements, the multires-lyrics-search is requiring jina[standard]==2.0.18. Have u tried with this version?

JoanFM commented 2 years ago

But most importantly, have you run the preliminary steps to make sure that the dataset is downloaded correctly and placed in the expected path?

Jackal1586 commented 2 years ago

Yes, I've followed the instructions of multires-lyrics-search and failed at the indexing part. I didn't receive any error from get_data.sh, so I assumed everything was ok with dataset. Initially I tried with jina[standard]==2.0.18. Then when @hanxiao mentioned about jina being too old, I tried other versions to check if it works or not. But still failed there.

JoanFM commented 2 years ago

Hey @Jackal1586 , could you try putting some printline in this line https://github.com/jina-ai/examples/blob/d8f903278597254bd96b1ed64fe8c9feefaa265c/multires-lyrics-search/helper.py#L11 and check what is the path it is searching for? And validate there is indeed data in that folder?

Jackal1586 commented 2 years ago

I wrote several debug statements, my input_generator function now looks as follows:

def input_generator(num_docs: int):
    lyrics_file = os.environ.setdefault(
        "JINA_DATA_FILE", "lyrics-data/lyrics-toy-data1000.csv"

    print("\n"*20, "debug", "\n"*5)
    print("lyrics_file: ", lyrics_file)
    print("os.path.exists: ", os.path.exists(lyrics_file))
    print("from os.system(f\"tail -5 \{lyrics_file\}\")")
    print(os.system(f"tail -5 {lyrics_file}"))
    print("\n"*5, "debug end", "\n"*20)
    with open(lyrics_file, newline="", encoding="utf-8") as f:
        reader = csv.reader(f)
        for row in it.islice(reader, num_docs):
            if row[-1] == "ENGLISH":
                d = Document(text=row[3])
                d.tags["ALink"] = row[0]
                d.tags["SName"] = row[1]
                d.tags["SLink"] = row[2]
                yield d

And the corresponding output:


lyrics_file:  lyrics-data/lyrics-data.csv
os.path.exists:  True
from os.system(f"tail -5 \{lyrics_file\}")
Yash' imizi yobada
(The homes of my fathers are burning)


 debug end 

Please let me know if I need to provide any other information.

JoanFM commented 2 years ago

Can you make sure that there is at lease one document being generated? Like does this generator even yields once?

Jackal1586 commented 2 years ago

I put a counter there, it yielded nothing. The counter value is 0.

Jackal1586 commented 2 years ago

I found row[-1] is using the shorthands such as en, es, etc. I replaced en with ENGLISH. Now it indexing worked. But I received these errors:

     segmenter@5765[I]:      segmenter@ 1[E]:KeyError('@r')
   root_indexer@5960[I]:   root_indexer@ 1[E]:IndexError("do not support this index type builtins.list: ['r']")

Should I do something more to fix this one?

Jackal1586 commented 2 years ago

Query is failing with the following error:

      segmenter@6491[I]:      segmenter@ 1[E]:KeyError('@r')
      segmenter@6491[I]:add "--quiet-error" to suppress the exception details
      segmenter@6491[I]:Traceback (most recent call last):
      segmenter@6491[I]:File "/usr/local/lib/python3.7/site-packages/jina/peapods/runtimes/zmq/zed.py", line 248, in _msg_callback
      segmenter@6491[I]:processed_msg = self._callback(msg)
      segmenter@6491[I]:File "/usr/local/lib/python3.7/site-packages/jina/peapods/runtimes/zmq/zed.py", line 234, in _callback
      segmenter@6491[I]:msg = self._post_hook(self._handle(self._pre_hook(msg)))
      segmenter@6491[I]:File "/usr/local/lib/python3.7/site-packages/jina/peapods/runtimes/zmq/zed.py", line 201, in _handle
      segmenter@6491[I]:File "/usr/local/lib/python3.7/site-packages/jina/peapods/runtimes/request_handlers/data_request_handler.py", line 161, in handle
      segmenter@6491[I]:File "/usr/local/lib/python3.7/site-packages/jina/executors/__init__.py", line 190, in __call__
      segmenter@6491[I]:self, **kwargs
      segmenter@6491[I]:File "/usr/local/lib/python3.7/site-packages/jina/executors/decorators.py", line 103, in arg_wrapper
      segmenter@6491[I]:return fn(*args, **kwargs)
      segmenter@6491[I]:File "/workspace/sentencizer.py", line 97, in segment
      segmenter@6491[I]:flat_docs = docs[traversal_path]
      segmenter@6491[I]:File "/usr/local/lib/python3.7/site-packages/jina/types/arrays/document.py", line 213, in __getitem__
      segmenter@6491[I]:return self[self._id_to_index[item]]
      segmenter@6491[I]:KeyError: '@r'
        indexer@6640[W]:no documents are indexed. searching empty docs. returning.
         ranker@6676[I]:         ranker@ 1[E]:KeyError('@r')
         ranker@6676[I]:add "--quiet-error" to suppress the exception details
         ranker@6676[I]:Traceback (most recent call last):
         ranker@6676[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/zmq/zed.py", line 248, in _msg_callback
         ranker@6676[I]:processed_msg = self._callback(msg)
         ranker@6676[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/zmq/zed.py", line 234, in _callback
         ranker@6676[I]:msg = self._post_hook(self._handle(self._pre_hook(msg)))
         ranker@6676[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/zmq/zed.py", line 196, in _handle
         ranker@6676[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/request_handlers/data_request_handler.py", line 140, in handle
         ranker@6676[I]:r_docs = self._executor(
         ranker@6676[I]:File "/usr/local/lib/python3.8/site-packages/jina/executors/__init__.py", line 185, in __call__
         ranker@6676[I]:return self.requests[req_endpoint](
         ranker@6676[I]:File "/usr/local/lib/python3.8/site-packages/jina/executors/decorators.py", line 103, in arg_wrapper
         ranker@6676[I]:return fn(*args, **kwargs)
         ranker@6676[I]:File "/workspace/simpleranker.py", line 51, in rank
         ranker@6676[I]:for doc in docs[traversal_paths]:
         ranker@6676[I]:File "/usr/local/lib/python3.8/site-packages/jina/types/arrays/document.py", line 213, in __getitem__
         ranker@6676[I]:return self[self._id_to_index[item]]
         ranker@6676[I]:KeyError: '@r'
   root_indexer@6775[I]:   root_indexer@ 1[E]:IndexError("do not support this index type builtins.list: ['m']")
   root_indexer@6775[I]:add "--quiet-error" to suppress the exception details
   root_indexer@6775[I]:Traceback (most recent call last):
   root_indexer@6775[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/zmq/zed.py", line 248, in _msg_callback
   root_indexer@6775[I]:processed_msg = self._callback(msg)
   root_indexer@6775[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/zmq/zed.py", line 234, in _callback
   root_indexer@6775[I]:msg = self._post_hook(self._handle(self._pre_hook(msg)))
   root_indexer@6775[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/zmq/zed.py", line 196, in _handle
   root_indexer@6775[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/request_handlers/data_request_handler.py", line 140, in handle
   root_indexer@6775[I]:r_docs = self._executor(
   root_indexer@6775[I]:File "/usr/local/lib/python3.8/site-packages/jina/executors/__init__.py", line 185, in __call__
   root_indexer@6775[I]:return self.requests[req_endpoint](
   root_indexer@6775[I]:File "/usr/local/lib/python3.8/site-packages/jina/executors/decorators.py", line 103, in arg_wrapper
   root_indexer@6775[I]:return fn(*args, **kwargs)
   root_indexer@6775[I]:File "/workspace/executor/lmdb_storage.py", line 168, in search
   root_indexer@6775[I]:docs_to_get = docs[traversal_paths]
   root_indexer@6775[I]:File "/usr/local/lib/python3.8/site-packages/jina/types/arrays/document.py", line 217, in __getitem__
   root_indexer@6775[I]:raise IndexError(f'do not support this index type {typename(item)}: {item}')
   root_indexer@6775[I]:IndexError: do not support this index type builtins.list: ['m']

I didn't change anything except for the ones mentioned here.

JoanFM commented 2 years ago

This is now related to the version problem.

JoanFM commented 2 years ago

You are working with a version of Jina that does not match the one expected by the Executor

Jackal1586 commented 2 years ago

I see. Thank you for the guidance. Now which version should I go for? Any version above 2.6.4 gave me error about jina-commons's dependency is not resolved.

JoanFM commented 2 years ago

I believe, what you need is to fix the versions of every Hub Executor to an older version that is not on 3.0

Jackal1586 commented 2 years ago

Ok, let me try this. I will post update on this issue after doing this change.

Jackal1586 commented 2 years ago

After lot of trail and error, I did the following acts.

I tried to change Hub Executor versions in flows/index..yml according to these instructions. I cloned Sentencizer, TransformerTorchEncoder, and SimpleIndexer. And reverted to commit before the "migrating to 3.0" related commits.

The flows/index..yml looks as following:

jtype: Flow # We define the flow used for indexing here
version: '1' # yml version
with: # Parameters for the flow
    workspace: $JINA_WORKSPACE # Workspace folder
executors: # Now, define all the executors that are used
    - name: segmenter # The first executor splits the input text into sentences which are stored as chunks in the original documents
      uses: MySentencizer # The type of the executor is Sentencizer, we download it from the hub as a docker container
      py_modules: /home/zarzis/code/python/examples/multires-lyrics-search/deps/executors/jinahub/segmenters/Sentencizer/sentencizer.py
    - name: encoder # Then, compute the embeddings of the sentences in this executor
      uses: TransformerTorchEncoder #'/data/sbmaruf/zarzis/examples/multires-lyrics-search/executor-text-transformers-torch-encoder'          # We use a TransformerTorchEncoder from the hub
      py_modules: /home/zarzis/code/python/examples/multires-lyrics-search/deps/executor-text-transformers-torch-encoder/transform_encoder.py
      volumes: '~/.cache/huggingface:/root/.cache/huggingface' # Mount the huggingface cache into the docker container
      uses_with: # Override some parameters for the executor
          pooling_strategy: 'cls' # This is the pooling strategy that is used by the encoder
          pretrained_model_name_or_path: distilbert-base-cased # The ML model that is used
          max_length: 96 # Max length argument for the tokenizer
          device: 'cpu' # Run the executor on CPU - For GPU, we would have to use another container!
          default_traversal_paths: ['c'] # Compute the embeddings on the chunk level - the sentences created before
    - name: indexer # Now, index the sentences and store them to disk.
      uses: SimpleIndexer # We use a simple indexer for that purpose (not in docker, but using source codes - there are some bugs with docker for this executor)
      py_modules: /home/zarzis/code/python/examples/multires-lyrics-search/deps/executor-simpleindexer/executor.py
      uses_metas: # Set some meta arguments for this executor
          workspace: $JINA_WORKSPACE # Define the workspace folder for the executor
      uses_with: # Override parameters for the executor
          default_traversal_paths: ['c'] # Store the sentences on disk - this means on chunk level
    - name: root_indexer # Additionally to the sentences, we also need to store the original songs which are not split into sentences
      uses: 'jinahub+docker://LMDBStorage' # Therefore, we use a LMDBStorage indexer
      volumes: $JINA_WORKSPACE_MOUNT # Again, mount the workspace
      uses_with: # Override some parameters for the LMDBStorage
          default_traversal_paths: ['r'] # Now, we store the root documents, not the sentence chunks
      needs: [gateway] # We can start this at the beginning - in parallel to the sentence flow
    - name: wait_both # Now, we wait for both the root indexing and the sentence path to finish
      needs: [indexer, root_indexer] # Continue once these two executor are finished

I then had to install transformers, and torch. Also jina==2.6.4 for mitigating the error: ImportError: 'DocumentArrayMemmap' not found in 'jina'. Then the indexing started without any errors.

During the indexing,

⠋ Working... ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00 estimating... Task exception was never retrieved
future: <Task finished name='Task-9' coro=<BaseStreamer._stream_requests.<locals>.iterate_requests() done, defined at /home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/jina/peapods/stream/base.py:132> exception=BadRequestType('fail to construct a <class \'jina.types.routing.table.RoutingTable\'> object from {\n  "active_pod": "start-gateway",\n  "pods": {\n    "encoder": {\n      "expected_parts": 1,\n      "host": "",\n      "out_edges": [\n        {\n          "pod": "indexer"\n        }\n      ],\n      "port": 48585,\n      "port_out": 35639,\n      "target_identity": "15ee9637e0294f819c46bf4a32e34108"\n    },\n    "end-gateway": {\n      "expected_parts": 1,\n      "host": "",\n      "port": 35313,\n      "port_out": 47053,\n      "target_identity": "384e6622f9c94f89ba4e33aee1c6eda0"\n    },\n    "indexer": {\n      "expected_parts": 1,\n      "host": "",\n      "out_edges": [\n        {\n          "pod": "wait_both"\n        }\n      ],\n      "port": 54523,\n      "port_out": 44047,\n      "target_identity": "790bbe84f4bd416398e98869ba2d4dbe"\n    },\n    "root_indexer": {\n      "expected_parts": 1,\n      "host": "",\n      "out_edges": [\n        {\n          "pod": "wait_both"\n        }\n      ],\n      "port": 42339,\n      "port_out": 35519,\n      "target_identity": "2429dd24ed504e7d9dd6e07e4c4289bb"\n    },\n    "segmenter": {\n      "expected_parts": 1,\n      "host": "",\n      "out_edges": [\n        {\n          "pod": "encoder"\n        }\n      ],\n      "port": 47817,\n      "port_out": 33495,\n      "target_identity": "7dc61fa137dd487eac722986a4027235"\n    },\n    "start-gateway": {\n      "host": "",\n      "out_edges": [\n        {\n          "pod": "segmenter"\n        },\n        {\n          "pod": "root_indexer"\n        }\n      ],\n      "port": 35313,\n      "port_out": 47053,\n      "target_identity": "384e6622f9c94f89ba4e33aee1c6eda0"\n    },\n    "wait_both": {\n      "expected_parts": 2,\n      "host": "",\n      "out_edges": [\n        {\n          "pod": "end-gateway"\n        }\n      ],\n      "port": 48669,\n      "port_out": 59563,\n      "target_identity": "750ea7525fff4c19a7d982e70969473c"\n    }\n  }\n}')>
Traceback (most recent call last):
  File "/home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/jina/types/routing/table.py", line 134, in __init__
    json_format.Parse(graph, self._pb_body)
  File "/home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/google/protobuf/json_format.py", line 436, in Parse
    return ParseDict(js, message, ignore_unknown_fields, descriptor_pool,
  File "/home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/google/protobuf/json_format.py", line 461, in ParseDict
    parser.ConvertMessage(js_dict, message, '')
TypeError: ConvertMessage() takes 3 positional arguments but 4 were given

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/jina/peapods/stream/base.py", line 142, in iterate_requests
    future: 'asyncio.Future' = self._handle_request(request=request)
  File "/home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/jina/peapods/stream/base.py", line 75, in _handle_request
  File "/home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/jina/peapods/stream/gateway.py", line 24, in _convert_to_message
    return Message(None, request, 'gateway', **vars(self.args))
  File "/home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/jina/types/message/__init__.py", line 68, in __init__
    self.envelope = self._add_envelope(*args, **kwargs)
  File "/home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/jina/types/message/__init__.py", line 235, in _add_envelope
  File "/home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/jina/types/routing/table.py", line 144, in __init__
    raise BadRequestType(
jina.excepts.BadRequestType: fail to construct a <class 'jina.types.routing.table.RoutingTable'> object from {
  "active_pod": "start-gateway",
  "pods": {
    "encoder": {
      "expected_parts": 1,
      "host": "",
      "out_edges": [
          "pod": "indexer"
      "port": 48585,
      "port_out": 35639,
      "target_identity": "15ee9637e0294f819c46bf4a32e34108"
    "end-gateway": {
      "expected_parts": 1,
      "host": "",
      "port": 35313,
      "port_out": 47053,
      "target_identity": "384e6622f9c94f89ba4e33aee1c6eda0"
    "indexer": {
      "expected_parts": 1,
      "host": "",
      "out_edges": [
          "pod": "wait_both"
      "port": 54523,
      "port_out": 44047,
      "target_identity": "790bbe84f4bd416398e98869ba2d4dbe"
    "root_indexer": {
      "expected_parts": 1,
      "host": "",
      "out_edges": [
          "pod": "wait_both"
      "port": 42339,
      "port_out": 35519,
      "target_identity": "2429dd24ed504e7d9dd6e07e4c4289bb"
    "segmenter": {
      "expected_parts": 1,
      "host": "",
      "out_edges": [
          "pod": "encoder"
      "port": 47817,
      "port_out": 33495,
      "target_identity": "7dc61fa137dd487eac722986a4027235"
    "start-gateway": {
      "host": "",
      "out_edges": [
          "pod": "segmenter"
          "pod": "root_indexer"
      "port": 35313,
      "port_out": 47053,
      "target_identity": "384e6622f9c94f89ba4e33aee1c6eda0"
    "wait_both": {
      "expected_parts": 2,
      "host": "",
      "out_edges": [
          "pod": "end-gateway"
      "port": 48669,
      "port_out": 59563,
      "target_identity": "750ea7525fff4c19a7d982e70969473c"

It was stuck here for 2 hours. I am at a loss about fixing it. Can you please help me on my next course of actions, @JoanFM ?

JoanFM commented 2 years ago

Hey @Jackal1586 ,

What is the version of protobuf that you are using? Can u try installing protobuf==3.13.0?

Jackal1586 commented 2 years ago

protobuf version was 3.20.1. after chaning it to protobuf==3.13.0 it worked. Indexing was done. Then I tried to run the simple frontend provided in static directory, also ran python app.py -t query. But I am unable to enable cors from any place. Tried in flows/query.yml, app.py, and Flow library too. How should I enable cors in this case? Also same origin requests do get responses.

For reference, I am providing my current environment here.

JoanFM commented 2 years ago

Have u checked this part of documentation?


Jackal1586 commented 2 years ago

I only read parts related to cors and experimented accordingly. Like using Flow's constructor one, and in yml cors: True. It didn't work for me.

JoanFM commented 2 years ago

Ah well, this is because cors was properly enabled after 3.x and not in 2.6.4

Jackal1586 commented 2 years ago

Then what should be my next course of action if I want to run the full cycle as shown in multires-lyrics-search? I mean I need to enable cors at some point, right?

JoanFM commented 2 years ago

You should migrate your Flow to 3.3.x with the Executor versions compatible with 3.x and also the client code adapted.

Jackal1586 commented 2 years ago

Ok, let me study the documentation related to migration, and experiement. I will post my results.

xiaoxiongfeng commented 2 years ago

Hey @Jackal1586 ,

What is the version of protobuf that you are using? Can u try installing protobuf==3.13.0?

I solve the same question by your way,thanks