Closed themantalope closed 1 year ago
Hello @themantalope ,
can u please provide more context about the problem you are facing?
@JoanFM
Thanks for following up. Initially there were no additional messages other than the index_sentenizer
failed to start. I eventually say some other warnings stating that spaCy
was not installed. Running pip install -q spacy
at the beginning of the notebook solved the issue.
@JoanFM
This actually is still an issue. Even when explicitly running pip install spacy
, i still get a warning when the flow is set up stating: ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.
Additionally, I am also (sometimes) getting an error regarding protobuf:
WARNI… JINA@108 Error getting the directory name from jinahub://PDFTableExtractor/latest. [12/22/22 11:28:48]
`--install-requirements` option is only valid when `uses` is a configuration file.
🔐 You are not logged in to Jina AI. To log in, use jina auth login or set env variable JINA_AUTH_TOKEN.
WARNI… JINA@108 Error getting the directory name from jinahub://PDFSegmenter. [12/22/22 11:28:53]
`--install-requirements` option is only valid when `uses` is a configuration file.
WARNI… JINA@108 Error getting the directory name from jinahub://SpacySentencizer. [12/22/22 11:29:20]
`--install-requirements` option is only valid when `uses` is a configuration file.
WARNI… JINA@108 Error getting the directory name from [12/22/22 11:29:24]
jinahub://ImagePreprocessor-skip-non-images. `--install-requirements` option is only
valid when `uses` is a configuration file.
WARNI… JINA@108 Error getting the directory name from [12/22/22 11:29:25]
jinahub://ImagePreprocessor-skip-non-images. `--install-requirements` option is only
valid when `uses` is a configuration file.
WARNI… JINA@108 Error getting the directory name from jinahub://CLIPEncoder/latest-gpu.
`--install-requirements` option is only valid when `uses` is a configuration file.
WARNI… JINA@108 Error getting the directory name from jinahub://CLIPEncoder/latest-gpu. [12/22/22 11:30:48]
`--install-requirements` option is only valid when `uses` is a configuration file.
WARNI… JINA@108 Error getting the directory name from jinahub://AnnLiteIndexer. [12/22/22 11:30:50]
`--install-requirements` option is only valid when `uses` is a configuration file.
ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.
Downloading: 100%
577M/577M [00:26<00:00, 24.2MB/s]
ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[/usr/local/lib/python3.8/dist-packages/pkg_resources/__init__.py](https://localhost:8080/#) in _dep_map(self)
3015 try:
-> 3016 return self.__dep_map
3017 except AttributeError:
28 frames
AttributeError: _DistInfoDistribution__dep_map
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
AttributeError: _pkg_info
During handling of the above exception, another exception occurred:
FileNotFoundError Traceback (most recent call last)
[/usr/local/lib/python3.8/dist-packages/pkg_resources/__init__.py](https://localhost:8080/#) in _get(self, path)
1609
1610 def _get(self, path):
-> 1611 with open(path, 'rb') as stream:
1612 return stream.read()
1613
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.8/dist-packages/protobuf-3.19.6.dist-info/METADATA'
can u set the environment variable JINA_LOG_LEVEL to DEBUG,
and share the exact cell that causes the error with the exact traceback?
The flow takes a very long time to start up after turning the log level to debug. The process has been running for 10+ minutes, this is what I have so far:
WARNI… JINA@110763 Error getting the directory name from jinahub://PDFTableExtractor/latest. [12/22/22 14:03:18]
`--install-requirements` option is only valid when `uses` is a configuration file.
WARNI… JINA@110763 Error getting the directory name from jinahub://PDFSegmenter.
`--install-requirements` option is only valid when `uses` is a configuration file.
WARNI… JINA@110763 Error getting the directory name from jinahub://SpacySentencizer. [12/22/22 14:03:19]
`--install-requirements` option is only valid when `uses` is a configuration file.
WARNI… JINA@110763 Error getting the directory name from [12/22/22 14:03:20]
jinahub://ImagePreprocessor-skip-non-images. `--install-requirements` option is only
valid when `uses` is a configuration file.
WARNI… JINA@110763 Error getting the directory name from
jinahub://ImagePreprocessor-skip-non-images. `--install-requirements` option is only
valid when `uses` is a configuration file.
WARNI… JINA@110763 Error getting the directory name from jinahub://CLIPEncoder/latest-gpu.
`--install-requirements` option is only valid when `uses` is a configuration file.
WARNI… JINA@110763 Error getting the directory name from jinahub://CLIPEncoder/latest-gpu. [12/22/22 14:03:26]
`--install-requirements` option is only valid when `uses` is a configuration file.
WARNI… JINA@110763 Error getting the directory name from jinahub://AnnLiteIndexer.
`--install-requirements` option is only valid when `uses` is a configuration file.
Waiting all_indexer... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11/1 0:00:00
DEBUG index_table_extractor/rep-0@110763 ready and listening [12/22/22 14:03:29]
DEBUG index_segmenter/rep-0@110763 ready and listening [12/22/22 14:03:29]
DEBUG index_tagger/rep-0@110763 ready and listening [12/22/22 14:03:29]
DEBUG index_sentencizer/rep-0@110763 waiting for ready or shutdown signal from runtime [12/22/22 14:03:29]
DEBUG index_sentencizer/rep-0@110763 shutdown is is already set. Runtime will end gracefully
on its own
DEBUG index_sentencizer/rep-0@110763 terminating the runtime process
DEBUG index_tags_copier/rep-0@110763 ready and listening [12/22/22 14:03:29]
DEBUG index_sentencizer/rep-0@110763 terminated
DEBUG index_sentencizer/rep-0@110763 joining the process
DEBUG index_sentencizer/rep-0@110763 successfully joined the process
DEBUG index_image_processor/rep-0@110763 ready and listening [12/22/22 14:03:29]
DEBUG search_image_processor/rep-0@110763 ready and listening [12/22/22 14:03:29]
DEBUG gateway/rep-0@110763 ready and listening [12/22/22 14:03:30]
ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.
DEBUG index_encoder/rep-0@111643 <executor.CLIPEncoder object at 0x7fc6e1cc7f70> is [12/22/22 14:03:54]
successfully loaded!
DEBUG index_encoder/rep-0@111643 start listening on 0.0.0.0:63551
DEBUG index_encoder/rep-0@111643 run grpc server forever
DEBUG index_encoder/rep-0@110763 ready and listening [12/22/22 14:03:54]
DEBUG search_encoder/rep-0@111648 <executor.CLIPEncoder object at 0x7fc6e1cb4fa0> is [12/22/22 14:04:16]
successfully loaded!
DEBUG search_encoder/rep-0@111648 start listening on 0.0.0.0:55492
DEBUG search_encoder/rep-0@111648 run grpc server forever
DEBUG search_encoder/rep-0@110763 ready and listening [12/22/22 14:04:16]
WARNI… all_indexer/rep-0@110763 <jina.orchestrate.pods.Pod object at 0x7fc6fc399550> timeout [12/22/22 14:13:29]
after waiting for 600000ms, if your executor takes time to load, you may increase
--timeout-ready
DEBUG all_indexer/rep-0@110763 waiting for ready or shutdown signal from runtime
DEBUG all_indexer/rep-0@110763 Runtime was never started. Runtime will end gracefully on its
own
DEBUG all_indexer/rep-0@110763 terminating the runtime process
DEBUG all_indexer/rep-0@110763 runtime process properly terminated
DEBUG all_indexer/rep-0@110763 terminated
DEBUG all_indexer/rep-0@110763 waiting for ready or shutdown signal from runtime
DEBUG all_indexer/rep-0@110763 shutdown is is already set. Runtime will end gracefully on its
own
DEBUG all_indexer/rep-0@110763 terminating the runtime process [12/22/22 14:13:30]
DEBUG all_indexer/rep-0@110763 runtime process properly terminated
DEBUG all_indexer/rep-0@110763 terminated
DEBUG all_indexer/rep-0@110763 joining the process
I think you may need more memory to run this example, can u check the memory consumption while this is happening?
On colab it's currently using 5GB out of 12 available.
The issue with runtime only occurred after turning on DEBUG
logging.
Okey it seems that the indexer
took too long to start. Have u tried this more than once?
It seems like SpacySentencizer may be causing issues too. I created the Executor aaaages ago, and haven't seen these kind of errors before. Here's a minimum (not) working example notebook - @JoanFM any ideas why it's failing?
Okey it seems that the
indexer
took too long to start. Have u tried this more than once?
Yes. Also, get to all the dependencies to install on colab properly I have had to restart the runtime.
I believe the spacy
version on which it depends is not anymore compatible with Jina because of protobuf versions
Ok that would make a lot of sense. I was getting protobuf errors, at one point getting an error stating that some component of the flow was looking for protobuf 3.18.something metadata but that protobuf > 4 was installed
@JoanFM any sentencizer available through Jina that could be swapped?
u can find in the Executor Hub in Jina AI Cloud, there you may find some
Ok, switched the sentenizer for the Torch sentenizer.
Now still having issues with the search code. Any help with this?
with flow:
client = Client(port=flow.port)
results = client.post(
"/search",
query_doc,
request_size=1,
parameters={
"filter": filter
},
show_progress=True,
target_executor="(search_*|all_*)"
)
WARNI… JINA@6849 Error getting the directory name from jinahub://PDFSegmenter. [12/22/22 19:45:54]
`--install-requirements` option is only valid when `uses` is a configuration file.
WARNI… JINA@6849 Error getting the directory name from jinaai://jina-ai/Sentencizer:latest.
`--install-requirements` option is only valid when `uses` is a configuration file.
WARNI… JINA@6849 Error getting the directory name from jinahub://TransformerTorchEncoder.
`--install-requirements` option is only valid when `uses` is a configuration file.
WARNI… JINA@6849 Error getting the directory name from jinaai://jina-ai/AnnLiteIndexer:latest.
`--install-requirements` option is only valid when `uses` is a configuration file.
DeprecationWarning: 'index_traversal_paths' will be deprecated in the future, please use 'index_access_paths'. (raised from /usr/local/lib/python3.8/dist-packages/jina/serve/helper.py:73)
DeprecationWarning: 'search_traversal_paths' will be deprecated in the future, please use 'search_access_paths'. (raised from /usr/local/lib/python3.8/dist-packages/jina/serve/helper.py:73)
2022-12-22 19:45:55.767 | INFO | annlite.index:restore:664 - restore Annlite from local
2022-12-22 19:45:55.771 | INFO | annlite.index:_rebuild_index_from_local:771 - Rebuild the indexer from scratch
2022-12-22 19:45:59.060 | INFO | annlite.index:_rebuild_index_from_local:788 - Load the model from /root/.cache/jina/AnnLiteIndexer/0/parameters-d67b9abb496ca1fd466b6d5378c78128
─────────────────────────────────────────── 🎉 Flow is ready to serve! ────────────────────────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│ ⛓ Protocol GRPC │
│ 🏠 Local [0.0.0.0](grpc://0.0.0.0:61118)[:](grpc://0.0.0.0:61118)[61118](grpc://0.0.0.0:61118) │
│ 🔒 Private [172.28.0.12](grpc://172.28.0.12:61118)[:](grpc://172.28.0.12:61118)[61118](grpc://172.28.0.12:61118) │
│ 🌍 Public [35.236.250.206](grpc://35.236.250.206:61118)[:](grpc://35.236.250.206:61118)[61118](grpc://35.236.250.206:61118) │
╰──────────────────────────────────────────╯
⠋ Working... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00 0% ETA: -:--:--
ERROR all_indexer/rep-0@26547 ValueError('Empty ndarray. Did you forget to set [12/22/22 19:46:02]
.embedding/.tensor value and now you are operating on it?')
add "--quiet-error" to suppress the exception details
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/jina/serve/runtimes/worker/__init__.py",
line 264, in process_data
result = await self._request_handler.handle(
File
"/usr/local/lib/python3.8/dist-packages/jina/serve/runtimes/worker/request_handling.py",
line 425, in handle
return_data = await self._executor.__acall__(
File "/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py", line
366, in __acall__
return await self.__acall_endpoint__(req_endpoint, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py", line
425, in __acall_endpoint__
return await exec_func(
File "/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py", line
383, in exec_func
return await get_or_reuse_loop().run_in_executor(
File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.8/dist-packages/jina/serve/executors/decorators.py", line
187, in arg_wrapper
return fn(executor_instance, *args, **kwargs)
File "/root/.cache/jina/hub-package/7yypg8qk/executor.py", line 113, in search
docs.match(self._index, filter=parameters.get('filter', None), limit=limit)
File "/usr/local/lib/python3.8/dist-packages/docarray/array/mixins/match.py", line 77,
in match
match_docs = darray.find(
File "/usr/local/lib/python3.8/dist-packages/docarray/array/mixins/find.py", line 200,
in find
n_rows, n_dim = ndarray.get_array_rows(_query)
File "/usr/local/lib/python3.8/dist-packages/docarray/math/ndarray.py", line 197, in
get_array_rows
array_type, _ = get_array_type(array)
File "/usr/local/lib/python3.8/dist-packages/docarray/math/ndarray.py", line 138, in
get_array_type
raise ValueError(
ValueError: Empty ndarray. Did you forget to set .embedding/.tensor value and now you
are operating on it?
Exception in thread Thread-134:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.8/dist-packages/jina/helper.py", line 1315, in run
self.result = asyncio.run(func(*args, **kwargs))
File "/usr/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/usr/local/lib/python3.8/dist-packages/jina/clients/mixin.py", line 266, in _get_results
async for resp in c._get_results(*args, **kwargs):
File "/usr/local/lib/python3.8/dist-packages/jina/clients/base/grpc.py", line 220, in _get_results
async for resp in self._stream_rpc(
File "/usr/local/lib/python3.8/dist-packages/jina/clients/base/grpc.py", line 85, in _stream_rpc
callback_exec(
File "/usr/local/lib/python3.8/dist-packages/jina/clients/helper.py", line 81, in callback_exec
raise BadServer(response.header)
jina.excepts.BadServer: request_id: "19b8d3339121444eb214e5af03d51f5f"
status {
code: ERROR
description: "ValueError(\'Empty ndarray. Did you forget to set .embedding/.tensor value and now you are operating on it?\')"
exception {
name: "ValueError"
args: "Empty ndarray. Did you forget to set .embedding/.tensor value and now you are operating on it?"
stacks: "Traceback (most recent call last):\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/jina/serve/runtimes/worker/__init__.py\", line 264, in process_data\n result = await self._request_handler.handle(\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/jina/serve/runtimes/worker/request_handling.py\", line 425, in handle\n return_data = await self._executor.__acall__(\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py\", line 366, in __acall__\n return await self.__acall_endpoint__(req_endpoint, **kwargs)\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py\", line 425, in __acall_endpoint__\n return await exec_func(\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py\", line 383, in exec_func\n return await get_or_reuse_loop().run_in_executor(\n"
stacks: " File \"/usr/lib/python3.8/concurrent/futures/thread.py\", line 57, in run\n result = self.fn(*self.args, **self.kwargs)\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/decorators.py\", line 187, in arg_wrapper\n return fn(executor_instance, *args, **kwargs)\n"
stacks: " File \"/root/.cache/jina/hub-package/7yypg8qk/executor.py\", line 113, in search\n docs.match(self._index, filter=parameters.get(\'filter\', None), limit=limit)\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/docarray/array/mixins/match.py\", line 77, in match\n match_docs = darray.find(\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/docarray/array/mixins/find.py\", line 200, in find\n n_rows, n_dim = ndarray.get_array_rows(_query)\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/docarray/math/ndarray.py\", line 197, in get_array_rows\n array_type, _ = get_array_type(array)\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/docarray/math/ndarray.py\", line 138, in get_array_type\n raise ValueError(\n"
stacks: "ValueError: Empty ndarray. Did you forget to set .embedding/.tensor value and now you are operating on it?\n"
executor: "AnnLiteIndexer"
}
}
exec_endpoint: "/search"
target_executor: ""
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[/usr/local/lib/python3.8/dist-packages/jina/helper.py](https://localhost:8080/#) in run_async(func, *args, **kwargs)
1329 try:
-> 1330 return thread.result
1331 except AttributeError:
AttributeError: '_RunThread' object has no attribute 'result'
During handling of the above exception, another exception occurred:
BadClient Traceback (most recent call last)
2 frames
[/usr/local/lib/python3.8/dist-packages/jina/helper.py](https://localhost:8080/#) in run_async(func, *args, **kwargs)
1332 from jina.excepts import BadClient
1333
-> 1334 raise BadClient(
1335 'something wrong when running the eventloop, result can not be retrieved'
1336 )
BadClient: something wrong when running the eventloop, result can not be retrieved
EDIT: Query parameters:
search_term = "trilobite diagram"
query_doc = Document(text=search_term)
element_type = [
"text",
"image"
"table"
]
filter = {
"element_type": {
"$in": element_type,
}
}
I want to try to find a basic working example of text extraction from PDF (text within PDF, not text which needs to be OCR'd), index it with a neural encoder (something like the TransformerTorchEncoder
) and search for it. I cannot get that working modifying the colab notebook or otherwise. Does anyone have something like that which works?
how exactly is the Flow that you are using right now?
@themantalope I think the problem is that colab has tensorflow
pre-installed and its version is kinda old (2.9.2). And this tensorflow
doesn't support the newer version of protobuf
. You can check this as reference. I uninstall tensorflow
before Flow
.
@JoanFM @AnneYang720
Thanks for your help, I really appreciate it.
Please take a look at this colab notebook which is a derivative of the link that @AnneYang720 sent.
Here is the flow:
flow = Flow().add(
uses="jinahub://PDFSegmenter", # Extract images/text
install_requirements=True,
name="index_segmenter"
).add(
uses="jinahub://SpacySentencizer", # Sentencize long text into sentences
uses_with={"traversal_paths": "@c"},
install_requirements=True,
name="index_sentencizer"
).add(
uses='jinahub://TransformerTorchEncoder',
uses_with={"traversal_paths":"@cc"},
install_requirements=True,
name="encoder"
).add(
uses="jinaai://jina-ai/AnnLiteIndexer", # Store vectors and metadata on disk
uses_with={
"index_traversal_paths": "@cc",
"search_traversal_paths": "@cc",
"columns": [("element_type", "str")],
"n_dim": 512
},
install_requirements=True,
name="all_indexer"
)
The indexing works fine. However I get an error regarding empty embeddings during query:
search_term = "trilobite diagram"
query_doc = Document(text=search_term)
element_type = [
"text",
"image"
"table"
]
filter = {
"element_type": {
"$in": element_type,
}
}
with flow:
client = Client(port=flow.port)
results = client.post(
"/search",
query_doc,
request_size=1,
parameters={
"filter": filter
},
show_progress=True,
target_executor="(search_*|all_*)"
)
WARNI… JINA@192 Error getting the directory name from jinahub://PDFSegmenter. [12/23/22 14:28:47]
`--install-requirements` option is only valid when `uses` is a configuration file.
WARNI… JINA@192 Error getting the directory name from jinahub://SpacySentencizer.
`--install-requirements` option is only valid when `uses` is a configuration file.
WARNI… JINA@192 Error getting the directory name from jinahub://TransformerTorchEncoder.
`--install-requirements` option is only valid when `uses` is a configuration file.
WARNI… JINA@192 Error getting the directory name from jinaai://jina-ai/AnnLiteIndexer.
`--install-requirements` option is only valid when `uses` is a configuration file.
DeprecationWarning: 'index_traversal_paths' will be deprecated in the future, please use 'index_access_paths'. (raised from /usr/local/lib/python3.8/dist-packages/jina/serve/helper.py:73)
RuntimeWarning: coroutine 'Flow._wait_until_all_ready.<locals>._f' was never awaited (raised from
/usr/local/lib/python3.8/dist-packages/jina/orchestrate/flow/base.py:1890)
DeprecationWarning: 'search_traversal_paths' will be deprecated in the future, please use 'search_access_paths'. (raised from /usr/local/lib/python3.8/dist-packages/jina/serve/helper.py:73)
UserWarning: Using "columns" as a List of Tuples will be deprecated soon. Please provide a Dictionary. (raised from /usr/local/lib/python3.8/dist-packages/docarray/array/storage/base/backend.py:98)
2022-12-23 14:28:48.178 | INFO | annlite.index:restore:664 - restore Annlite from local
2022-12-23 14:28:48.201 | INFO | annlite.index:_rebuild_index_from_local:771 - Rebuild the indexer from scratch
2022-12-23 14:28:48.796 | INFO | annlite.index:_rebuild_index_from_local:788 - Load the model from /root/.cache/jina/AnnLiteIndexer/0/parameters-d67b9abb496ca1fd466b6d5378c78128
─────────────────────────────────────────── 🎉 Flow is ready to serve! ────────────────────────────────────────────
╭────────────── 🔗 Endpoint ───────────────╮
│ ⛓ Protocol GRPC │
│ 🏠 Local [0.0.0.0](grpc://0.0.0.0:50955)[:](grpc://0.0.0.0:50955)[50955](grpc://0.0.0.0:50955) │
│ 🔒 Private [172.28.0.12](grpc://172.28.0.12:50955)[:](grpc://172.28.0.12:50955)[50955](grpc://172.28.0.12:50955) │
╰──────────────────────────────────────────╯
⠋ Working... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0:00:00 0% ETA: -:--:--
ERROR all_indexer/rep-0@2573 ValueError('Empty ndarray. Did you forget to set [12/23/22 14:28:53]
.embedding/.tensor value and now you are operating on it?')
add "--quiet-error" to suppress the exception details
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/jina/serve/runtimes/worker/__init__.py",
line 264, in process_data
result = await self._request_handler.handle(
File
"/usr/local/lib/python3.8/dist-packages/jina/serve/runtimes/worker/request_handling.py",
line 425, in handle
return_data = await self._executor.__acall__(
File "/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py", line
366, in __acall__
return await self.__acall_endpoint__(req_endpoint, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py", line
425, in __acall_endpoint__
return await exec_func(
File "/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py", line
383, in exec_func
return await get_or_reuse_loop().run_in_executor(
File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.8/dist-packages/jina/serve/executors/decorators.py", line
187, in arg_wrapper
return fn(executor_instance, *args, **kwargs)
File "/root/.cache/jina/hub-package/7yypg8qk/executor.py", line 113, in search
docs.match(self._index, filter=parameters.get('filter', None), limit=limit)
File "/usr/local/lib/python3.8/dist-packages/docarray/array/mixins/match.py", line 77,
in match
match_docs = darray.find(
File "/usr/local/lib/python3.8/dist-packages/docarray/array/mixins/find.py", line 200,
in find
n_rows, n_dim = ndarray.get_array_rows(_query)
File "/usr/local/lib/python3.8/dist-packages/docarray/math/ndarray.py", line 197, in
get_array_rows
array_type, _ = get_array_type(array)
File "/usr/local/lib/python3.8/dist-packages/docarray/math/ndarray.py", line 138, in
get_array_type
raise ValueError(
ValueError: Empty ndarray. Did you forget to set .embedding/.tensor value and now you
are operating on it?
RuntimeWarning: coroutine 'Flow._wait_until_all_ready.<locals>._async_wait_ready' was never awaited (raised from /usr/local/lib/python3.8/dist-packages/jina/orchestrate/flow/base.py:1782)
Exception in thread Thread-56:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.8/dist-packages/jina/helper.py", line 1315, in run
self.result = asyncio.run(func(*args, **kwargs))
File "/usr/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/usr/local/lib/python3.8/dist-packages/jina/clients/mixin.py", line 266, in _get_results
async for resp in c._get_results(*args, **kwargs):
File "/usr/local/lib/python3.8/dist-packages/jina/clients/base/grpc.py", line 220, in _get_results
async for resp in self._stream_rpc(
File "/usr/local/lib/python3.8/dist-packages/jina/clients/base/grpc.py", line 85, in _stream_rpc
callback_exec(
File "/usr/local/lib/python3.8/dist-packages/jina/clients/helper.py", line 81, in callback_exec
raise BadServer(response.header)
jina.excepts.BadServer: request_id: "5ed3dcd36a8c4ff094d2766e9cb44857"
status {
code: ERROR
description: "ValueError(\'Empty ndarray. Did you forget to set .embedding/.tensor value and now you are operating on it?\')"
exception {
name: "ValueError"
args: "Empty ndarray. Did you forget to set .embedding/.tensor value and now you are operating on it?"
stacks: "Traceback (most recent call last):\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/jina/serve/runtimes/worker/__init__.py\", line 264, in process_data\n result = await self._request_handler.handle(\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/jina/serve/runtimes/worker/request_handling.py\", line 425, in handle\n return_data = await self._executor.__acall__(\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py\", line 366, in __acall__\n return await self.__acall_endpoint__(req_endpoint, **kwargs)\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py\", line 425, in __acall_endpoint__\n return await exec_func(\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/__init__.py\", line 383, in exec_func\n return await get_or_reuse_loop().run_in_executor(\n"
stacks: " File \"/usr/lib/python3.8/concurrent/futures/thread.py\", line 57, in run\n result = self.fn(*self.args, **self.kwargs)\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/jina/serve/executors/decorators.py\", line 187, in arg_wrapper\n return fn(executor_instance, *args, **kwargs)\n"
stacks: " File \"/root/.cache/jina/hub-package/7yypg8qk/executor.py\", line 113, in search\n docs.match(self._index, filter=parameters.get(\'filter\', None), limit=limit)\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/docarray/array/mixins/match.py\", line 77, in match\n match_docs = darray.find(\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/docarray/array/mixins/find.py\", line 200, in find\n n_rows, n_dim = ndarray.get_array_rows(_query)\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/docarray/math/ndarray.py\", line 197, in get_array_rows\n array_type, _ = get_array_type(array)\n"
stacks: " File \"/usr/local/lib/python3.8/dist-packages/docarray/math/ndarray.py\", line 138, in get_array_type\n raise ValueError(\n"
stacks: "ValueError: Empty ndarray. Did you forget to set .embedding/.tensor value and now you are operating on it?\n"
executor: "AnnLiteIndexer"
}
}
exec_endpoint: "/search"
target_executor: ""
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[/usr/local/lib/python3.8/dist-packages/jina/helper.py](https://localhost:8080/#) in run_async(func, *args, **kwargs)
1329 try:
-> 1330 return thread.result
1331 except AttributeError:
AttributeError: '_RunThread' object has no attribute 'result'
During handling of the above exception, another exception occurred:
BadClient Traceback (most recent call last)
2 frames
[/usr/local/lib/python3.8/dist-packages/jina/helper.py](https://localhost:8080/#) in run_async(func, *args, **kwargs)
1332 from jina.excepts import BadClient
1333
-> 1334 raise BadClient(
1335 'something wrong when running the eventloop, result can not be retrieved'
1336 )
BadClient: something wrong when running the eventloop, result can not be retrieved
Based on other examples, it seems like this should work. When I google the error, it isn't clear to me how to fix it, or why during query the embedding tensor is not getting set.
the problem is that ur Encoder does not match the target executor parameter that you pass. Also, you may need to adapt the access_paths
parameter.
Maybe u would prefer to have a separatr Flow for ur search
Yeah, I'd suggest separate Flows for index and search. This example was mostly an experimental approach by me, using an older version of Jina. There are so many ways this could go wrong.
Probably easier to tear it apart and use the bits to start from scratch tbh. Consider it a fun Christmas experience :) (and you thought untangling christmas tree lights was frustrating)
plus I'm no longer on this project and maintaining it several months later (after so long away from it) is tough.
Once only God and myself knew how my code worked. Now only God knows
For anyone concerned, here is a demo usage of sentenizer.
If you are using colab, you need to uninstall the pre-installed tensorflow
and install jina
!pip uninstall -y tensorflow
!pip install jina
flow = (
Flow()
.add(
uses="jinahub://PDFSegmenter", # Extract images/text
install_requirements=True,
name="index_segmenter",
)
.add(
uses="jinahub://SpacySentencizer", # Sentencize long text into sentences
uses_with={"traversal_paths": "@c"},
install_requirements=True,
name="index_sentencizer",
)
.add(
uses='jinahub://TransformerTorchEncoder',
uses_with={"traversal_paths": "@cc"},
install_requirements=True,
name="index_encoder",
)
.add(
uses="jinaai://jina-ai/AnnLiteIndexer",
uses_with={
"index_traversal_paths": "@cc",
"search_traversal_paths": "@cc",
"columns": {"element_type": "str"},
"n_dim": 768,
},
install_requirements=True,
name="all_indexer",
)
)
import os
if not os.path.isdir("data"):
!wget -q -N --output-document data.zip https://github.com/jina-ai/workshops/blob/main/notebooks/pdf_search/part_2_images_and_text/data.zip?raw=true
!unzip -n data.zip
!rm -f data.zip
from docarray import DocumentArray, Document
docs = DocumentArray.from_files("data/*.pdf")
for doc in docs:
doc.load_uri_to_blob()
with flow:
client = Client(port=flow.port)
docs = client.post(
"/index",
docs,
request_size=1,
show_progress=True,
target_executor="(index_*|all_*)",
)
search_term = "trilobite diagram"
query_doc = Document(text=search_term)
element_type = ["text", "image" "table"]
filter = {
"element_type": {
"$in": element_type,
}
}
search_flow = (
Flow()
.add(
uses='jinahub://TransformerTorchEncoder',
install_requirements=True,
name="index_encoder",
)
.add(
uses="jinaai://jina-ai/AnnLiteIndexer",
uses_with={
"index_traversal_paths": "@cc",
"search_traversal_paths": "@cc",
"columns": {"element_type": "str"},
"n_dim": 768,
},
install_requirements=True,
name="all_indexer",
)
)
with search_flow:
client = Client(port=search_flow.port)
results = client.post(
"/search",
query_doc,
request_size=1,
show_progress=True,
)
Describe the bug During execution of the colab example notebook I'm getting a cryptic error when trying to run the indexing flow.
Describe how you solve it I am unable to solve it.