Closed Jackal1586 closed 2 years ago
My guess is that the paths are not well set and then nothing is being sent through the client
Can you please specify which paths you meant here?
jina is probably too old?
I tried this one:
Environment:
Error: ModuleNotFoundError: Dependencies listed in requirements.txt are not all installed locally, this Executor may not run as expect. To install dependencies, add --install-requirements
or set install_requirements = True
My modified requirements.txt
click==8.0.1 jina[standard]~=2.0 kaggle==1.5.12 docker git+https://github.com/jina-ai/jina-commons@v0.0.6
Couldn't try jina~=3.0
as it didn't satisfy jina-commons
's requirement.
Am I missing something here?
Hey @Jackal1586 ,
As u see in the requirements, the multires-lyrics-search is requiring jina[standard]==2.0.18
. Have u tried with this version?
But most importantly, have you run the preliminary steps to make sure that the dataset
is downloaded correctly and placed in the expected path?
Yes, I've followed the instructions of multires-lyrics-search
and failed at the indexing part. I didn't receive any error from get_data.sh
, so I assumed everything was ok with dataset. Initially I tried with jina[standard]==2.0.18
. Then when @hanxiao mentioned about jina being too old, I tried other versions to check if it works or not. But still failed there.
Hey @Jackal1586 , could you try putting some printline in this line https://github.com/jina-ai/examples/blob/d8f903278597254bd96b1ed64fe8c9feefaa265c/multires-lyrics-search/helper.py#L11 and check what is the path it is searching for? And validate there is indeed data in that folder?
I wrote several debug statements, my input_generator
function now looks as follows:
def input_generator(num_docs: int):
lyrics_file = os.environ.setdefault(
"JINA_DATA_FILE", "lyrics-data/lyrics-toy-data1000.csv"
)
print("\n"*20, "debug", "\n"*5)
print("lyrics_file: ", lyrics_file)
print("os.path.exists: ", os.path.exists(lyrics_file))
print("from os.system(f\"tail -5 \{lyrics_file\}\")")
print(os.system(f"tail -5 {lyrics_file}"))
print("\n"*5, "debug end", "\n"*20)
with open(lyrics_file, newline="", encoding="utf-8") as f:
reader = csv.reader(f)
for row in it.islice(reader, num_docs):
if row[-1] == "ENGLISH":
d = Document(text=row[3])
d.tags["ALink"] = row[0]
d.tags["SName"] = row[1]
d.tags["SLink"] = row[2]
yield d
And the corresponding output:
debug
lyrics_file: lyrics-data/lyrics-data.csv
os.path.exists: True
from os.system(f"tail -5 \{lyrics_file\}")
Yash' imizi yobada
(The homes of my fathers are burning)
Chorus
",en
0
debug end
Please let me know if I need to provide any other information.
Can you make sure that there is at lease one document being generated? Like does this generator even yields
once?
I put a counter there, it yielded nothing. The counter value is 0
.
I found row[-1]
is using the shorthands such as en
, es
, etc. I replaced en
with ENGLISH
. Now it indexing worked. But I received these errors:
segmenter@5765[I]: segmenter@ 1[E]:KeyError('@r')
root_indexer@5960[I]: root_indexer@ 1[E]:IndexError("do not support this index type builtins.list: ['r']")
Should I do something more to fix this one?
Query is failing with the following error:
segmenter@6491[I]: segmenter@ 1[E]:KeyError('@r')
segmenter@6491[I]:add "--quiet-error" to suppress the exception details
segmenter@6491[I]:Traceback (most recent call last):
segmenter@6491[I]:File "/usr/local/lib/python3.7/site-packages/jina/peapods/runtimes/zmq/zed.py", line 248, in _msg_callback
segmenter@6491[I]:processed_msg = self._callback(msg)
segmenter@6491[I]:File "/usr/local/lib/python3.7/site-packages/jina/peapods/runtimes/zmq/zed.py", line 234, in _callback
segmenter@6491[I]:msg = self._post_hook(self._handle(self._pre_hook(msg)))
segmenter@6491[I]:File "/usr/local/lib/python3.7/site-packages/jina/peapods/runtimes/zmq/zed.py", line 201, in _handle
segmenter@6491[I]:peapod_name=self.name,
segmenter@6491[I]:File "/usr/local/lib/python3.7/site-packages/jina/peapods/runtimes/request_handlers/data_request_handler.py", line 161, in handle
segmenter@6491[I]:field='groundtruths',
segmenter@6491[I]:File "/usr/local/lib/python3.7/site-packages/jina/executors/__init__.py", line 190, in __call__
segmenter@6491[I]:self, **kwargs
segmenter@6491[I]:File "/usr/local/lib/python3.7/site-packages/jina/executors/decorators.py", line 103, in arg_wrapper
segmenter@6491[I]:return fn(*args, **kwargs)
segmenter@6491[I]:File "/workspace/sentencizer.py", line 97, in segment
segmenter@6491[I]:flat_docs = docs[traversal_path]
segmenter@6491[I]:File "/usr/local/lib/python3.7/site-packages/jina/types/arrays/document.py", line 213, in __getitem__
segmenter@6491[I]:return self[self._id_to_index[item]]
segmenter@6491[I]:KeyError: '@r'
indexer@6640[W]:no documents are indexed. searching empty docs. returning.
ranker@6676[I]: ranker@ 1[E]:KeyError('@r')
ranker@6676[I]:add "--quiet-error" to suppress the exception details
ranker@6676[I]:Traceback (most recent call last):
ranker@6676[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/zmq/zed.py", line 248, in _msg_callback
ranker@6676[I]:processed_msg = self._callback(msg)
ranker@6676[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/zmq/zed.py", line 234, in _callback
ranker@6676[I]:msg = self._post_hook(self._handle(self._pre_hook(msg)))
ranker@6676[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/zmq/zed.py", line 196, in _handle
ranker@6676[I]:self._data_request_handler.handle(
ranker@6676[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/request_handlers/data_request_handler.py", line 140, in handle
ranker@6676[I]:r_docs = self._executor(
ranker@6676[I]:File "/usr/local/lib/python3.8/site-packages/jina/executors/__init__.py", line 185, in __call__
ranker@6676[I]:return self.requests[req_endpoint](
ranker@6676[I]:File "/usr/local/lib/python3.8/site-packages/jina/executors/decorators.py", line 103, in arg_wrapper
ranker@6676[I]:return fn(*args, **kwargs)
ranker@6676[I]:File "/workspace/simpleranker.py", line 51, in rank
ranker@6676[I]:for doc in docs[traversal_paths]:
ranker@6676[I]:File "/usr/local/lib/python3.8/site-packages/jina/types/arrays/document.py", line 213, in __getitem__
ranker@6676[I]:return self[self._id_to_index[item]]
ranker@6676[I]:KeyError: '@r'
root_indexer@6775[I]: root_indexer@ 1[E]:IndexError("do not support this index type builtins.list: ['m']")
root_indexer@6775[I]:add "--quiet-error" to suppress the exception details
root_indexer@6775[I]:Traceback (most recent call last):
root_indexer@6775[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/zmq/zed.py", line 248, in _msg_callback
root_indexer@6775[I]:processed_msg = self._callback(msg)
root_indexer@6775[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/zmq/zed.py", line 234, in _callback
root_indexer@6775[I]:msg = self._post_hook(self._handle(self._pre_hook(msg)))
root_indexer@6775[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/zmq/zed.py", line 196, in _handle
root_indexer@6775[I]:self._data_request_handler.handle(
root_indexer@6775[I]:File "/usr/local/lib/python3.8/site-packages/jina/peapods/runtimes/request_handlers/data_request_handler.py", line 140, in handle
root_indexer@6775[I]:r_docs = self._executor(
root_indexer@6775[I]:File "/usr/local/lib/python3.8/site-packages/jina/executors/__init__.py", line 185, in __call__
root_indexer@6775[I]:return self.requests[req_endpoint](
root_indexer@6775[I]:File "/usr/local/lib/python3.8/site-packages/jina/executors/decorators.py", line 103, in arg_wrapper
root_indexer@6775[I]:return fn(*args, **kwargs)
root_indexer@6775[I]:File "/workspace/executor/lmdb_storage.py", line 168, in search
root_indexer@6775[I]:docs_to_get = docs[traversal_paths]
root_indexer@6775[I]:File "/usr/local/lib/python3.8/site-packages/jina/types/arrays/document.py", line 217, in __getitem__
root_indexer@6775[I]:raise IndexError(f'do not support this index type {typename(item)}: {item}')
root_indexer@6775[I]:IndexError: do not support this index type builtins.list: ['m']
I didn't change anything except for the ones mentioned here.
This is now related to the version
problem.
You are working with a version of Jina that does not match the one expected by the Executor
I see. Thank you for the guidance. Now which version should I go for? Any version above 2.6.4
gave me error about jina-commons
's dependency is not resolved.
I believe, what you need is to fix the versions of every Hub Executor to an older version that is not on 3.0
Ok, let me try this. I will post update on this issue after doing this change.
After lot of trail and error, I did the following acts.
I tried to change Hub Executor versions in flows/index..yml
according to these instructions. I cloned Sentencizer, TransformerTorchEncoder, and SimpleIndexer. And reverted to commit before the "migrating to 3.0" related commits.
The flows/index..yml
looks as following:
jtype: Flow # We define the flow used for indexing here
version: '1' # yml version
with: # Parameters for the flow
workspace: $JINA_WORKSPACE # Workspace folder
executors: # Now, define all the executors that are used
- name: segmenter # The first executor splits the input text into sentences which are stored as chunks in the original documents
uses: MySentencizer # The type of the executor is Sentencizer, we download it from the hub as a docker container
py_modules: /home/zarzis/code/python/examples/multires-lyrics-search/deps/executors/jinahub/segmenters/Sentencizer/sentencizer.py
- name: encoder # Then, compute the embeddings of the sentences in this executor
uses: TransformerTorchEncoder #'/data/sbmaruf/zarzis/examples/multires-lyrics-search/executor-text-transformers-torch-encoder' # We use a TransformerTorchEncoder from the hub
py_modules: /home/zarzis/code/python/examples/multires-lyrics-search/deps/executor-text-transformers-torch-encoder/transform_encoder.py
volumes: '~/.cache/huggingface:/root/.cache/huggingface' # Mount the huggingface cache into the docker container
uses_with: # Override some parameters for the executor
pooling_strategy: 'cls' # This is the pooling strategy that is used by the encoder
pretrained_model_name_or_path: distilbert-base-cased # The ML model that is used
max_length: 96 # Max length argument for the tokenizer
device: 'cpu' # Run the executor on CPU - For GPU, we would have to use another container!
default_traversal_paths: ['c'] # Compute the embeddings on the chunk level - the sentences created before
- name: indexer # Now, index the sentences and store them to disk.
uses: SimpleIndexer # We use a simple indexer for that purpose (not in docker, but using source codes - there are some bugs with docker for this executor)
py_modules: /home/zarzis/code/python/examples/multires-lyrics-search/deps/executor-simpleindexer/executor.py
uses_metas: # Set some meta arguments for this executor
workspace: $JINA_WORKSPACE # Define the workspace folder for the executor
uses_with: # Override parameters for the executor
default_traversal_paths: ['c'] # Store the sentences on disk - this means on chunk level
- name: root_indexer # Additionally to the sentences, we also need to store the original songs which are not split into sentences
uses: 'jinahub+docker://LMDBStorage' # Therefore, we use a LMDBStorage indexer
volumes: $JINA_WORKSPACE_MOUNT # Again, mount the workspace
uses_with: # Override some parameters for the LMDBStorage
default_traversal_paths: ['r'] # Now, we store the root documents, not the sentence chunks
needs: [gateway] # We can start this at the beginning - in parallel to the sentence flow
- name: wait_both # Now, we wait for both the root indexing and the sentence path to finish
needs: [indexer, root_indexer] # Continue once these two executor are finished
I then had to install transformers
, and torch
. Also jina==2.6.4
for mitigating the error: ImportError: 'DocumentArrayMemmap' not found in 'jina'
. Then the indexing started without any errors.
During the indexing,
⠋ Working... ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00 estimating... Task exception was never retrieved
future: <Task finished name='Task-9' coro=<BaseStreamer._stream_requests.<locals>.iterate_requests() done, defined at /home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/jina/peapods/stream/base.py:132> exception=BadRequestType('fail to construct a <class \'jina.types.routing.table.RoutingTable\'> object from {\n "active_pod": "start-gateway",\n "pods": {\n "encoder": {\n "expected_parts": 1,\n "host": "0.0.0.0",\n "out_edges": [\n {\n "pod": "indexer"\n }\n ],\n "port": 48585,\n "port_out": 35639,\n "target_identity": "15ee9637e0294f819c46bf4a32e34108"\n },\n "end-gateway": {\n "expected_parts": 1,\n "host": "0.0.0.0",\n "port": 35313,\n "port_out": 47053,\n "target_identity": "384e6622f9c94f89ba4e33aee1c6eda0"\n },\n "indexer": {\n "expected_parts": 1,\n "host": "0.0.0.0",\n "out_edges": [\n {\n "pod": "wait_both"\n }\n ],\n "port": 54523,\n "port_out": 44047,\n "target_identity": "790bbe84f4bd416398e98869ba2d4dbe"\n },\n "root_indexer": {\n "expected_parts": 1,\n "host": "0.0.0.0",\n "out_edges": [\n {\n "pod": "wait_both"\n }\n ],\n "port": 42339,\n "port_out": 35519,\n "target_identity": "2429dd24ed504e7d9dd6e07e4c4289bb"\n },\n "segmenter": {\n "expected_parts": 1,\n "host": "0.0.0.0",\n "out_edges": [\n {\n "pod": "encoder"\n }\n ],\n "port": 47817,\n "port_out": 33495,\n "target_identity": "7dc61fa137dd487eac722986a4027235"\n },\n "start-gateway": {\n "host": "0.0.0.0",\n "out_edges": [\n {\n "pod": "segmenter"\n },\n {\n "pod": "root_indexer"\n }\n ],\n "port": 35313,\n "port_out": 47053,\n "target_identity": "384e6622f9c94f89ba4e33aee1c6eda0"\n },\n "wait_both": {\n "expected_parts": 2,\n "host": "0.0.0.0",\n "out_edges": [\n {\n "pod": "end-gateway"\n }\n ],\n "port": 48669,\n "port_out": 59563,\n "target_identity": "750ea7525fff4c19a7d982e70969473c"\n }\n }\n}')>
Traceback (most recent call last):
File "/home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/jina/types/routing/table.py", line 134, in __init__
json_format.Parse(graph, self._pb_body)
File "/home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/google/protobuf/json_format.py", line 436, in Parse
return ParseDict(js, message, ignore_unknown_fields, descriptor_pool,
File "/home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/google/protobuf/json_format.py", line 461, in ParseDict
parser.ConvertMessage(js_dict, message, '')
TypeError: ConvertMessage() takes 3 positional arguments but 4 were given
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/jina/peapods/stream/base.py", line 142, in iterate_requests
future: 'asyncio.Future' = self._handle_request(request=request)
File "/home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/jina/peapods/stream/base.py", line 75, in _handle_request
asyncio.create_task(self.iolet.send_message(self._convert_to_message(request)))
File "/home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/jina/peapods/stream/gateway.py", line 24, in _convert_to_message
return Message(None, request, 'gateway', **vars(self.args))
File "/home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/jina/types/message/__init__.py", line 68, in __init__
self.envelope = self._add_envelope(*args, **kwargs)
File "/home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/jina/types/message/__init__.py", line 235, in _add_envelope
envelope.routing_table.CopyFrom(RoutingTable(routing_table).proto)
File "/home/zarzis/anaconda3/envs/jina/lib/python3.8/site-packages/jina/types/routing/table.py", line 144, in __init__
raise BadRequestType(
jina.excepts.BadRequestType: fail to construct a <class 'jina.types.routing.table.RoutingTable'> object from {
"active_pod": "start-gateway",
"pods": {
"encoder": {
"expected_parts": 1,
"host": "0.0.0.0",
"out_edges": [
{
"pod": "indexer"
}
],
"port": 48585,
"port_out": 35639,
"target_identity": "15ee9637e0294f819c46bf4a32e34108"
},
"end-gateway": {
"expected_parts": 1,
"host": "0.0.0.0",
"port": 35313,
"port_out": 47053,
"target_identity": "384e6622f9c94f89ba4e33aee1c6eda0"
},
"indexer": {
"expected_parts": 1,
"host": "0.0.0.0",
"out_edges": [
{
"pod": "wait_both"
}
],
"port": 54523,
"port_out": 44047,
"target_identity": "790bbe84f4bd416398e98869ba2d4dbe"
},
"root_indexer": {
"expected_parts": 1,
"host": "0.0.0.0",
"out_edges": [
{
"pod": "wait_both"
}
],
"port": 42339,
"port_out": 35519,
"target_identity": "2429dd24ed504e7d9dd6e07e4c4289bb"
},
"segmenter": {
"expected_parts": 1,
"host": "0.0.0.0",
"out_edges": [
{
"pod": "encoder"
}
],
"port": 47817,
"port_out": 33495,
"target_identity": "7dc61fa137dd487eac722986a4027235"
},
"start-gateway": {
"host": "0.0.0.0",
"out_edges": [
{
"pod": "segmenter"
},
{
"pod": "root_indexer"
}
],
"port": 35313,
"port_out": 47053,
"target_identity": "384e6622f9c94f89ba4e33aee1c6eda0"
},
"wait_both": {
"expected_parts": 2,
"host": "0.0.0.0",
"out_edges": [
{
"pod": "end-gateway"
}
],
"port": 48669,
"port_out": 59563,
"target_identity": "750ea7525fff4c19a7d982e70969473c"
}
}
}
It was stuck here for 2 hours. I am at a loss about fixing it. Can you please help me on my next course of actions, @JoanFM ?
Hey @Jackal1586 ,
What is the version of protobuf
that you are using? Can u try installing protobuf==3.13.0
?
protobuf
version was 3.20.1
. after chaning it to protobuf==3.13.0
it worked. Indexing was done. Then I tried to run the simple frontend provided in static
directory, also ran python app.py -t query
. But I am unable to enable cors
from any place. Tried in flows/query.yml
, app.py
, and Flow
library too. How should I enable cors in this case? Also same origin requests do get responses.
For reference, I am providing my current environment here.
Have u checked this part of documentation?
I only read parts related to cors and experimented accordingly. Like using Flow
's constructor one, and in yml cors: True
. It didn't work for me.
Ah well, this is because cors
was properly enabled after 3.x and not in 2.6.4
Then what should be my next course of action if I want to run the full cycle as shown in multires-lyrics-search
? I mean I need to enable cors
at some point, right?
You should migrate your Flow to 3.3.x with the Executor versions compatible with 3.x and also the client code adapted.
Ok, let me study the documentation related to migration, and experiement. I will post my results.
Hey @Jackal1586 ,
What is the version of
protobuf
that you are using? Can u try installingprotobuf==3.13.0
?
I solve the same question by your way,thanks
Description
I encountered an error in
flow.post
method while trying to runpython app.py -t index
in multires-lyrics-search after cloning this repository: examples. The error message was:gateway@61291[E]:receive an empty stream from the client! please check your client's inputs, you can use "Client.check_input(inputs)"
.Environment