jina-ai / executor-faissindexer

A similarity search indexer based on Faiss. https://hub.jina.ai/executor/8gsd0tts
4 stars 1 forks source link

failure when I try to use it as substitude indexer in Jina-examples #11

Closed TITC closed 2 years ago

TITC commented 2 years ago

Here are excerpts described at this example

It is then recommended to use more advanced indexers like the FaissIndexer.

But when I replace this part with instruction in Jina-hub

from jina import Flow

f = Flow().add(uses='jinahub+docker://FaissIndexer/v0.1')

from

  - name: indexer                                     # Now, index the text documents with the embeddings
    uses: 'jinahub://SimpleIndexer/old'                   # We use the SimpleIndexer for this purpose

to

  - name: indexer                                     # Now, index the text documents with the embeddings
    uses: 'jinahub+docker://FaissIndexer/v0.1'                   # We use the SimpleIndexer for this purpose

Although it's haven't any error messages, but not work. what's wrong with it? What's the proper manner to use it?

numb3r3 commented 2 years ago

@TITC Thanks for your feedback. Could you test the latest one jinahub+docker://FaissIndexer/v0.2 ?

numb3r3 commented 2 years ago

it's haven't any error messages, but not work

BTW, what is the meaning of not work?. It cannot return any result?

TITC commented 2 years ago

it's haven't any error messages, but not work

BTW, what is the meaning of not work?. It cannot return any result?

thanks for your response, I have tried both 0.1 and 0.2 versions.

  1. query without result
  2. haven't any serialization files generated by indexer like xxx.bin or folder name workspace
  3. work normally when using 'jinahub://SimpleIndexer/old' and that will generate a folder name workspace
numb3r3 commented 2 years ago

Gocha. jinahub+docker does not work now. I will fix it ASAP. BTW, can you test whether jinahub://FaissIndexer/v0.2 works?

numb3r3 commented 2 years ago

@TITC

  1. query without result

Does the index procedure complete? BTW, if you are using macOS, please check the memory limit of each docker container. FaissIndexer consumes more memory than SimpleIndexer.

  1. haven't any serialization files generated by indexer like xxx.bin or folder name workspace

Because you are using jinahub+docker, the workspace folder is located at Docker container, not at the local host machine. You can use jinahub://FaissIndexer/v0.2 instead. Then you will find the serialization files in worksapce folder.

TITC commented 2 years ago
  1. I am not sure which message indicates the procedure complete, there haven't any error messages and everything seems normal.

    Does the index procedure complete?

  2. My OS is Ubuntu 18.04 and memory remaining 73GB in the host, with no memory limit for any docker container.
  3. jinahub://FaissIndexer/v0.2 does generate workspace, but it's timeout even through proxy. I will give another try later.
TITC commented 2 years ago

maybe I need to go deeper in jina, such as read source code....., as so far it doesn't work even with jinahub://FaissIndexer/v0.2, could you give any advice about the below error?

(jina-demo) user@gpu3090:~/user/draft/Lab/Retrieval/jina/demo/examples/wikipedia-sentences-v2$ python app.py -t index
Pulling from jinahub/u9pqs8eb
Digest: sha256:f16f15b8ee60915bf72a578571769482c499ae416572831d15cb1e2d6ef3d0a8
Status: Image is up to date for jinahub/u9pqs8eb:v34-2.0.18

   FaissIndexer@4189[I]:Using "lmdb" as the storage backend
^CProcess Process-2:
    transformer@3689[W]:Pea is being closed before being ready. Most likely some other Pea in the Flow or Pod failed to start
Traceback (most recent call last):
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/site-packages/jina/peapods/peas/__init__.py", line 74, in run
    runtime = runtime_cls(
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/site-packages/jina/peapods/runtimes/container/__init__.py", line 42, in __init__
    while self._is_container_alive and not self.is_ready:
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/site-packages/jina/peapods/runtimes/container/__init__.py", line 271, in is_ready
    status = self.status
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/site-packages/jina/peapods/runtimes/container/__init__.py", line 260, in status
    return send_ctrl_message(
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/site-packages/jina/peapods/zmq/__init__.py", line 624, in send_ctrl_message
    r = recv_message(sock, timeout)
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/site-packages/jina/peapods/zmq/__init__.py", line 735, in recv_message
    msg_data = sock.recv_multipart()
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/site-packages/zmq/sugar/socket.py", line 625, in recv_multipart
    parts = [self.recv(flags, copy=copy, track=track)]
  File "zmq/backend/cython/socket.pyx", line 781, in zmq.backend.cython.socket.Socket.recv
  File "zmq/backend/cython/socket.pyx", line 817, in zmq.backend.cython.socket.Socket.recv
  File "zmq/backend/cython/socket.pyx", line 186, in zmq.backend.cython.socket._recv_copy
  File "zmq/backend/cython/checkrc.pxd", line 13, in zmq.backend.cython.checkrc._check_rc
KeyboardInterrupt

Aborted!
(jina-demo) user@gpu3090:~/user/draft/Lab/Retrieval/jina/demo/examples/wikipedia-sentences-v2$ docker ps -a |grep jina
3a8e66690aca   jinahub/u9pqs8eb:v34-2.0.18                     "jina executor --use…"   6 minutes ago   Up 6 minutes                                                                                                                          transformerContainerRuntime
(jina-demo) user@gpu3090:~/user/draft/Lab/Retrieval/jina/demo/examples/wikipedia-sentences-v2$ docker stop3a8e66690aca
docker: 'stop3a8e66690aca' is not a docker command.
See 'docker --help'
(jina-demo) user@gpu3090:~/user/draft/Lab/Retrieval/jina/demo/examples/wikipedia-sentences-v2$ docker stop 3a8e66690aca
3a8e66690aca
(jina-demo) user@gpu3090:~/user/draft/Lab/Retrieval/jina/demo/examples/wikipedia-sentences-v2$ tmux a -t jina
no sessions
(jina-demo) user@gpu3090:~/user/draft/Lab/Retrieval/jina/demo/examples/wikipedia-sentences-v2$ tmux new -s jina
[detached (from session jina)]
(jina-demo) user@gpu3090:~/user/draft/Lab/Retrieval/jina/demo/examples/wikipedia-sentences-v2$ tmux a -t jina
[detached (from session jina)]
(jina-demo) user@gpu3090:~/user/draft/Lab/Retrieval/jina/demo/examples/wikipedia-sentences-v2$ tmux a -t jina
[detached (from session jina)]
(jina-demo) user@gpu3090:~/user/draft/Lab/Retrieval/jina/demo/examples/wikipedia-sentences-v2$ tmux a -t jina

    transformer@18139[I]:f = func(self, *args, **kwargs)
    transformer@18139[I]:File "/workspace/transform_encoder.py", line 65, in __init__
    transformer@18139[I]:self.model = AutoModel.from_pretrained(
    transformer@18139[I]:File "/usr/local/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 359, in from_pretrained
    transformer@18139[I]:return cls._model_mapping[type(config)].from_pretrained(
    transformer@18139[I]:File "/usr/local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1047, in from_pretrained
    transformer@18139[I]:raise EnvironmentError(msg)
    transformer@18139[I]:OSError: Can't load weights for 'sentence-transformers/distilbert-base-nli-stsb-mean-tokens'. Make sure that:
    transformer@18139[I]:
    transformer@18139[I]:- 'sentence-transformers/distilbert-base-nli-stsb-mean-tokens' is a correct model identifier listed on 'https://huggingface.co/models'
    transformer@18139[I]:
    transformer@18139[I]:- or 'sentence-transformers/distilbert-base-nli-stsb-mean-tokens' is the correct path to a directory containing a file named one of pytorch_model.bin, tf_model.h5, model.ckpt.
    transformer@18139[I]:
    transformer@18139[I]:
    transformer@18139[I]:
    transformer@18139[I]:The above exception was the direct cause of the following exception:
    transformer@18139[I]:
    transformer@18139[I]:Traceback (most recent call last):
    transformer@18139[I]:File "/usr/local/bin/jina", line 8, in <module>
    transformer@18139[I]:sys.exit(main())
    transformer@18139[I]:File "/usr/local/lib/python3.8/site-packages/cli/__init__.py", line 119, in main
    transformer@18139[I]:getattr(api, args.cli.replace('-', '_'))(args)
    transformer@18139[I]:File "/usr/local/lib/python3.8/site-packages/cli/api.py", line 43, in zed_runtime
    transformer@18139[I]:with ZEDRuntime(args) as runtime:
    transformer@18139[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/zmq/zed.py", line 52, in __init__
    transformer@18139[I]:self._data_request_handler = DataRequestHandler(self.args, self.logger)
    transformer@18139[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/request_handlers/data_request_handler.py", line 68, in __init__
    transformer@18139[I]:self._load_executor()
    transformer@18139[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/request_handlers/data_request_handler.py", line 98, in _load_executor
    transformer@18139[I]:raise ExecutorFailToLoad from ex
    transformer@18139[I]:jina.excepts.ExecutorFailToLoad
    transformer@18139[E]:Exception('the container fails to start, check the arguments or entrypoint') during <class 'jina.peapods.runtimes.container.ContainerRuntime'> initialization
 add "--quiet-error" to suppress the exception details
Traceback (most recent call last):
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/site-packages/jina/peapods/peas/__init__.py", line 74, in run
    runtime = runtime_cls(
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/site-packages/jina/peapods/runtimes/container/__init__.py", line 48, in __init__
    raise Exception(
Exception: the container fails to start, check the arguments or entrypoint
           Flow@17738[E]:transformer:<jina.peapods.pods.Pod object at 0x7f160f680940> can not be started due to TimeoutError('jina.peapods.peas.BasePea:transformer can not be initialized after 600000.0ms'), Flow is aborted
Traceback (most recent call last):
  File "app.py", line 93, in <module>
    main()
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "app.py", line 87, in main
    index(num_docs)
  File "app.py", line 49, in index
    with flow:
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/site-packages/jina/flow/base.py", line 930, in __enter__
    return self.start()
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/site-packages/jina/flow/base.py", line 975, in start
    v.wait_start_success()
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/site-packages/jina/peapods/pods/__init__.py", line 517, in wait_start_success
    p.wait_start_success()
  File "/home/user/anaconda3/envs/jina-demo/lib/python3.8/site-packages/jina/peapods/peas/__init__.py", line 278, in wait_start_success
    raise TimeoutError(
TimeoutError: jina.peapods.peas.BasePea:transformer can not be initialized after 600000.0ms

@numb3r3

Hippopotamus0308 commented 2 years ago

@TITC Have you install all dependencies for using FaissIndexer? You can install requirements via .add(uses='jinahub://FaissIndexer/v0.2', install_requirments=True) or you can try the latest versionjinahub://FaissIndexer/latest. Also, the Jina version in the examples(2.0.18) is a little bit outdated, you can update Jina to a newer version. Try using Jina 2.1.7 instead.

TITC commented 2 years ago

Have you install all dependencies for using FaissIndexer?

All of the dependencies listed below have been installed and I just verified again. image

You can install requirements via .add(uses='jinahub://FaissIndexer/v0.2', install_requirments=True)

here is what I have done before and reproduce it one more time. comment the indexer in the yml file

jtype: Flow                                           # This file defines the flow (both index and query) for the wikipedia sentences example
version: '1'                                          # This is the yml file version
with:                                                 # Additional arguments for the flow
  workspace: $JINA_WORKSPACE                          # Workspace folder path
  port_expose: $JINA_PORT                             # Network Port for the flow
executors:                                            # Now, define the executors that are run on this flow
  - name: transformer                                 # This executor computes an embedding based on the input text documents
    uses: 'jinahub+docker://TransformerTorchEncoder/v0.1'  # We use a Transformer Torch Encoder from the hub as a docker container
  # - name: indexer                                     # Now, index the text documents with the embeddings
  #   uses: 'jinahub://FaissIndexer/v0.2'                   # We use the SimpleIndexer for this purpose

and add executor to flow

def index(num_docs):
    flow = Flow().load_config('flows/flow.yml').add(uses='jinahub://FaissIndexer/v0.2', install_requirments=True)
    data_path = os.path.join(os.path.dirname(
        __file__), os.environ.get('JINA_DATA_FILE', None))
    with flow:
        flow.post(on='/index', inputs=input_generator(num_docs, data_path),
                  show_progress=True)

def query(top_k):
    flow = Flow().load_config('flows/flow.yml').add(uses='jinahub://FaissIndexer/v0.2', install_requirments=True)

still not work

⏳   |█                   | ⏱️ 0.0s 🐎 0.0 RPS           pod1@58632[E]:AttributeError("'DocumentArray' object has no attribute 'embeddings'")
 add "--quiet-error" to suppress the exception details

the Jina version in the examples(2.0.18) is a little bit outdated, you can update Jina to a newer version. Try using Jina 2.1.7

then pip install -U jina, it's work. Thanks for your help X) @Hippopotamus0308