jina-ai / jina

☁️ Build multimodal AI applications with cloud-native stack
https://docs.jina.ai
Apache License 2.0
20.98k stars 2.22k forks source link

Error during indexing. ValueError: need at least one array to stack #382

Closed vitojph closed 4 years ago

vitojph commented 4 years ago

Describe your problem

I'm trying to reproduce the BERT-based Semantic Search Engine with a different collection. Unlike the SouthPark example, my corpus is made of short text documents, a couple of paragraphs long. I preprocessed my collection to segment the sentences using spaCy, and generate a single two-column CSV file with the following structure:

doc_id, text

What is your guess?

Whenever I try to index this collection, the process gets stuck after a ValueError: need at least one array to stack error:

Click here to see the complete logs ``` $ python app.py -t index -n 100 Flow@8953[S]:successfully built Flow from a yaml config Sentencizer@8961[I]:post initiating, this may take some time... Sentencizer@8961[I]:post initiating, this may take some time takes 0.001 secs Sentencizer@8961[S]:successfully built Sentencizer from a yaml config splitter@8961[I]:setting up sockets... splitter@8961[I]:input tcp://0.0.0.0:56579 (SUB_CONNECT) output tcp://0.0.0.0:39789 (PUSH_CONNECT) control over tcp://0.0.0.0:42053 (PAIR_BIND) splitter@8961[S]:ready and listening TransformerTorc@8964[I]:post initiating, this may take some time... TransformerTorc@8964[I]:post initiating, this may take some time takes 2.714 secs TransformerTorc@8964[S]:successfully built TransformerTorchEncoder from a yaml config encoder@8964[I]:setting up sockets... encoder@8964[I]:input tcp://0.0.0.0:39789 (PULL_BIND) output tcp://0.0.0.0:51541 (PUSH_CONNECT) co ntrol over tcp://0.0.0.0:35377 (PAIR_BIND) encoder@8964[S]:ready and listening NumpyIndexer@8969[I]:post initiating, this may take some time... NumpyIndexer@8969[I]:post initiating, this may take some time takes 0.001 secs NumpyIndexer@8969[S]:successfully built NumpyIndexer from a yaml config BasePbIndexer@8969[I]:post initiating, this may take some time... BasePbIndexer@8969[I]:post initiating, this may take some time takes 0.000 secs BasePbIndexer@8969[S]:successfully built BasePbIndexer from a yaml config ChunkIndexer@8969[I]:post initiating, this may take some time... ChunkIndexer@8969[I]:post initiating, this may take some time takes 0.000 secs ChunkIndexer@8969[S]:successfully built ChunkIndexer from a yaml config NumpyIndexer@8969[W]:you can not query from as its "query_handler" is not set. If you are indexing data then that is fine, just means you can not do q uerying-while-indexing.If you are querying data then the index file must be broken. BasePbIndexer@8969[W]:you can not query from as its "query_handler" is not set. If you are indexing data then that is fine, just means you can not d o querying-while-indexing.If you are querying data then the index file must be broken. chunk_indexer@8969[I]:setting up sockets... chunk_indexer@8969[I]:input tcp://0.0.0.0:51541 (PULL_BIND) output tcp://0.0.0.0:40623 (PUSH_CONNECT) control over tcp://0.0.0.0:55601 (PAIR_BIND) chunk_indexer@8969[S]:ready and listening DocPbIndexer@8972[I]:post initiating, this may take some time... DocPbIndexer@8972[I]:post initiating, this may take some time takes 0.001 secs DocPbIndexer@8972[S]:successfully built DocPbIndexer from a yaml config doc_indexer@8972[I]:setting up sockets... doc_indexer@8972[I]:input tcp://0.0.0.0:56579 (SUB_CONNECT) output tcp://0.0.0.0:40623 (PUSH_CONNECT) control over tcp://0.0.0.0:59287 (PAIR_BIND) doc_indexer@8972[S]:ready and listening BaseExecutor@8975[I]:post initiating, this may take some time... DocPbIndexer@8972[I]:post initiating, this may take some time takes 0.001 secs DocPbIndexer@8972[S]:successfully built DocPbIndexer from a yaml config doc_indexer@8972[I]:setting up sockets... doc_indexer@8972[I]:input tcp://0.0.0.0:56579 (SUB_CONNECT) out put tcp://0.0.0.0:40623 (PUSH_CONNECT) control over tcp://0.0.0.0:59287 (PAI R_BIND) doc_indexer@8972[S]:ready and listening BaseExecutor@8975[I]:post initiating, this may take some time... BaseExecutor@8975[I]:post initiating, this may take some time takes 0.000 secs BaseExecutor@8975[S]:successfully built BaseExecutor from a yaml config join_all@8975[I]:setting up sockets... join_all@8975[I]:input tcp://0.0.0.0:40623 (PULL_BIND) output tcp://0.0.0.0:57961 (PUSH_BIND) control over tcp://0.0.0.0:38593 (PAIR_BIND) join_all@8975[S]:ready and listening BaseExecutor@8953[I]:post initiating, this may take some time... BaseExecutor@8953[I]:post initiating, this may take some time takes 0.001 secs GatewayPea@8953[S]:gateway is listening at: 0.0.0.0:54505 Flow@8953[I]:6 Pods (i.e. 6 Peas) are running in this Flow Flow@8953[S]:flow is now ready for use, current build_level is GRAPH PyClient@8953[S]:connected to the gateway at 0.0.0.0:54505! index [= ] 📃 0 ⏱️ 0.0s 🐎 0.0/s 0 batchindex ... gateway@8953[I]:setting up sockets... gateway@8953[I]:input tcp://0.0.0.0:57961 (PULL_CONNECT) output tcp://0.0.0.0:56579 (PUB_BIND) control over ipc:///tmp/tmptwzoi4dz (PAIR_BIND) gateway@8953[I]:prefetching 50 requests... gateway@8953[W]:if this takes too long, you may want to take smaller "--prefetch" or ask client to reduce "--batch-size" gateway@8953[I]:prefetching 50 requests takes 0.022 secs gateway@8953[I]:send: 0 recv: 0 pending: 0 splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ encoder@8964[I]:received "index" from gateway▸splitter▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 3 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 5 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 6 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 8 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 10 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 12 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 14 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 15 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 16 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 17 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 21 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 22 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 24 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 25 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 28 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 29 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 30 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 33 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 35 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 36 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 37 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 38 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 40 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 42 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 43 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 46 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 47 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ splitter@8961[W]:doc 48 gives no chunk splitter@8961[I]:received "index" from gateway▸⚐ encoder@8964[I]:received "index" from gateway▸splitter▸⚐ chunk_indexer@8969[I]:received "index" from gateway▸splitter▸encoder▸⚐ join_all@8975[I]:received "index" from gateway▸splitter▸encoder▸chunk_indexer▸⚐ encoder@8964[I]:received "index" from gateway▸splitter▸⚐ chunk_indexer@8969[I]:received "index" from gateway▸splitter▸encoder▸⚐ join_all@8975[I]:received "index" from gateway▸splitter▸encoder▸chunk_indexer▸⚐ encoder@8964[I]:received "index" from gateway▸splitter▸⚐ chunk_indexer@8969[I]:received "index" from gateway▸splitter▸encoder▸⚐ encoder@8964[E]:unknown exception: need at least one array to stack Traceback (most recent call last): File "/media/data/projects/jina-examples/southpark-search/venv/lib/python3.7/site-packages/jina/peapods/pea.py", line 345, in run self.loop_body() File "/media/data/projects/jina-examples/southpark-search/venv/lib/python3.7/site-packages/jina/peapods/pea.py", line 310, in loop_body msg = self.zmqlet.recv_message(callback=self.msg_callback) File "/media/data/projects/jina-examples/southpark-search/venv/lib/python3.7/site-packages/jina/peapods/zmq.py", line 222, in recv_message return callback(msg) File "/media/data/projects/jina-examples/southpark-search/venv/lib/python3.7/site-packages/jina/peapods/pea.py", line 294, in msg_callback return self._callback(msg) File "/media/data/projects/jina-examples/southpark-search/venv/lib/python3.7/site-packages/jina/peapods/pea.py", line 284, in _callback self.pre_hook(msg).handle(msg).post_hook(msg) File "/media/data/projects/jina-examples/southpark-search/venv/lib/python3.7/site-packages/jina/peapods/pea.py", line 170, in handle self.executor(self.request_type) File "/media/data/projects/jina-examples/southpark-search/venv/lib/python3.7/site-packages/jina/executors/__init__.py", line 534, in __call__ d() File "/media/data/projects/jina-examples/southpark-search/venv/lib/python3.7/site-packages/jina/drivers/encode.py", line 20, in __call__ contents, chunk_pts, no_chunk_docs, bad_chunk_ids = extract_chunks(self.req.docs, embedding=False) File "/media/data/projects/jina-examples/southpark-search/venv/lib/python3.7/site-packages/jina/drivers/helper.py", line 107, in extract_chunks return np.stack(contents), chunk_pts, no_chunk_docs, bad_chunk_ids File "<__array_function__ internals>", line 6, in stack File "/media/data/projects/jina-examples/southpark-search/venv/lib/python3.7/site-packages/numpy/core/shape_base.py", line 422, in stack raise ValueError('need at least one array to stack') ValueError: need at least one array to stack encoder@8964[I]:#sent: 3 #recv: 4 sent_size: 10.9 KB recv_size: 2.1 KB join_all@8975[I]:received "index" from gateway▸splitter▸encoder▸chunk_indexer▸⚐ encoder@8964[I]:#sent: 3 #recv: 4 sent_size: 10.9 KB recv_size: 2.1 KB encoder@8964[S]:terminated ```

When I press ctrl-c, the logs continue:

Click here ``` ^C [1486.595 secs] ✅ done in ⏱ 1486.6s 🐎 0.0/s chunk_indexer@8969[W]:user cancel the process doc_indexer@8972[W]:user cancel the process join_all@8975[W]:user cancel the process DocPbIndexer@8972[I]:no update since 2020-05-06 13:41:36, will not save. If you really want to save it, call "touch()" before "save()" to force savin g doc_indexer@8972[I]:executor says there is nothing to save PyClient@8953[W]:user cancel the process doc_indexer@8972[I]:#sent: 0 #recv: 0 sent_size: 0 Bytes recv_size: 0 By tes join_all@8975[I]:#sent: 0 #recv: 3 sent_size: 0 Bytes recv_size: 1.7 KB splitter@8961[W]:user cancel the process splitter@8961[I]:#sent: 50 #recv: 50 sent_size: 26.9 KB recv_size: 20.9 KB NumpyIndexer@8969[S]:artifacts of this executor (vecidx) is persisted to /home/aiteam/projects/jina-examples/test/sbnpsago/chunk_indexer-0/vecidx.bin BasePbIndexer@8969[S]:artifacts of this executor (chunkidx) is persisted to /home/aiteam/projects/jina-examples/test/sbnpsago/chunk_indexer-0/chunkidx.bin PyClient@8953[S]:terminated join_all@8975[I]:#sent: 0 #recv: 3 sent_size: 0 Bytes recv_size: 1.7 KB ChunkIndexer@8969[I]:no update since 2020-05-06 13:41:36, will not save. If you really want to save it, call "touch()" before "save()" to force saving chunk_indexer@8969[I]:dumped changes to the executor, 1487s since last the save join_all@8975[S]:terminated chunk_indexer@8969[I]:#sent: 3 #recv: 3 sent_size: 1.7 KB recv_size: 10.9 KB splitter@8961[I]:#sent: 50 #recv: 50 sent_size: 26.9 KB recv_size: 20.9 KB splitter@8961[S]:terminated GatewayPea@8953[S]:terminated doc_indexer@8972[I]:#sent: 0 #recv: 0 sent_size: 0 Bytes recv_size: 0 Bytes doc_indexer@8972[S]:terminated ```

Environment

jina                          0.1.7
jina-proto                    0.0.20
jina-vcs-tag                  (unset)
libzmq                        4.3.2
pyzmq                         1.18.4
protobuf                      3.11.3
proto-backend                 cpp
grpcio                        1.28.1
ruamel.yaml                   0.16.10
python                        3.7.3
platform                      Linux
platform-release              5.0.0-1020-gcp
platform-version              #20-Ubuntu SMP Tue Oct 1 00:10:19 UTC 2019
architecture                  x86_64
processor                     x86_64
jina-resources                /media/data/projects/jina-examples/southpark-search/venv/lib/python3.7/site-packages/jina/resources
JINA_ARRAY_QUANT              (unset)
JINA_CONTRIB_MODULE           (unset)
JINA_CONTRIB_MODULE_IS_LOADING(unset)
JINA_CONTROL_PORT             (unset)
JINA_DEFAULT_HOST             (unset)
JINA_EXECUTOR_WORKDIR         (unset)
JINA_FULL_CLI                 (unset)
JINA_IPC_SOCK_TMP             (unset)
JINA_LOG_FILE                 (unset)
JINA_LOG_LONG                 (unset)
JINA_LOG_NO_COLOR             (unset)
JINA_LOG_PROFILING            (unset)
JINA_LOG_SSE                  (unset)
JINA_LOG_VERBOSITY            (unset)
JINA_PROFILING                (unset)
JINA_SOCKET_HWM               (unset)
JINA_STACK_CONFIG             (unset)
JINA_TEST_CONTAINER           (unset)
JINA_TEST_PRETRAINED          (unset)
JINA_VCS_VERSION              (unset)
JINA_VERSION                  (unset)
JINA_WARN_UNNAMED             (unset)

The example with the SouthPark documents works. Any idea of what's going on? Thanks in advance.

hanxiao commented 4 years ago

I notice that the following warning appears quite often in your log.

splitter@8961[W]:doc 42 gives no chunk

As it suggests, this doc contains no chunk. What does that mean to the whole pipeline? Let's say your batch_size=2, then each of your request has two documents. If you happens to have 2 empty-chunk-doc in a row. Then the encoder will have nothing to encode with: it receives empty chunks and returns None. The error ValueError('need at least one array to stack') is no surprising then, as there is no embedding generated for this batch.

Quick fix suggested for you and for now:

And yes, a real fix is required on our side to ensure the workflow is robust regardless empty batch or not.

vitojph commented 4 years ago

Thanks @hanxiao for the tips.

I tried with different parameters. My crafter has min_sent_len: 0 and max_sent_len: 256 and I increased the batch_size during the indexing up to 64, but the process seems to get stuck.

Here are some additional logs, just in case they're useful:

$ python app.py -t index -n 500
           Flow@32263[S]:successfully built Flow from a yaml config
    Sentencizer@32271[I]:post initiating, this may take some time...
    Sentencizer@32271[I]:post initiating, this may take some time takes 0.001 secs
    Sentencizer@32271[S]:successfully built Sentencizer from a yaml config
       splitter@32271[I]:setting up sockets...
       splitter@32271[I]:input tcp://0.0.0.0:40937 (SUB_CONNECT)     output tcp://0.0.0.0:37997 (PUSH_CONNECT) control over tcp://0.0.0.0:41927 (PAIR_BIND)
       splitter@32271[S]:ready and listening
TransformerTorc@32274[I]:post initiating, this may take some time...
TransformerTorc@32274[I]:post initiating, this may take some time takes 2.578 secs
TransformerTorc@32274[S]:successfully built TransformerTorchEncoder from a yaml config
        encoder@32274[I]:setting up sockets...
        encoder@32274[I]:input tcp://0.0.0.0:37997 (PULL_BIND)   output tcp://0.0.0.0:53499 (PUSH_CONNECT)   control over tcp://0.0.0.0:37609 (PAIR_BIND)
        encoder@32274[S]:ready and listening
   NumpyIndexer@32279[I]:post initiating, this may take some time...
   NumpyIndexer@32279[I]:post initiating, this may take some time takes 0.001 secs
   NumpyIndexer@32279[S]:successfully built NumpyIndexer from a yaml config
  BasePbIndexer@32279[I]:post initiating, this may take some time...
  BasePbIndexer@32279[I]:post initiating, this may take some time takes 0.000 secs
  BasePbIndexer@32279[S]:successfully built BasePbIndexer from a yaml config
   ChunkIndexer@32279[I]:post initiating, this may take some time...
   ChunkIndexer@32279[I]:post initiating, this may take some time takes 0.000 secs
   ChunkIndexer@32279[S]:successfully built ChunkIndexer from a yaml config
   NumpyIndexer@32279[W]:you can not query from <jina.executors.indexers.vector.numpy.NumpyIndexer object at 0x7f8142dd7dd8> as its "query_handler" is not set. If you are indexing data then that is fine, just means you can not do querying-while-indexing.If you are querying data then the index file must be broken.
  chunk_indexer@32279[I]:setting up sockets...
  chunk_indexer@32279[I]:input tcp://0.0.0.0:53499 (PULL_BIND)   output tcp://0.0.0.0:45679 (PUSH_CONNECT)   control over tcp://0.0.0.0:58605 (PAIR_BIND)
  chunk_indexer@32279[S]:ready and listening
   DocPbIndexer@32282[I]:post initiating, this may take some time...
   DocPbIndexer@32282[I]:post initiating, this may take some time takes 0.000 secs
   DocPbIndexer@32282[S]:successfully built DocPbIndexer from a yaml config
    doc_indexer@32282[I]:setting up sockets...
    doc_indexer@32282[I]:input tcp://0.0.0.0:40937 (SUB_CONNECT)     output tcp://0.0.0.0:45679 (PUSH_CONNECT) control over tcp://0.0.0.0:56505 (PAIR_BIND)
    doc_indexer@32282[S]:ready and listening
   BaseExecutor@32285[I]:post initiating, this may take some time...
   BaseExecutor@32285[I]:post initiating, this may take some time takes 0.000 secs
   BaseExecutor@32285[S]:successfully built BaseExecutor from a yaml config
       join_all@32285[I]:setting up sockets...
       join_all@32285[I]:input tcp://0.0.0.0:45679 (PULL_BIND)   output tcp://0.0.0.0:54945 (PUSH_BIND)  control over tcp://0.0.0.0:46895 (PAIR_BIND)
       join_all@32285[S]:ready and listening
   BaseExecutor@32263[I]:post initiating, this may take some time...
   BaseExecutor@32263[I]:post initiating, this may take some time takes 0.001 secs
     GatewayPea@32263[S]:gateway is listening at: 0.0.0.0:47853
           Flow@32263[I]:6 Pods (i.e. 6 Peas) are running in this Flow
           Flow@32263[S]:flow is now ready for use, current build_level is GRAPH
       PyClient@32263[S]:connected to the gateway at 0.0.0.0:47853!
index [=                   ] 📃      0 ⏱️ 0.0s 🐎 0.0/s      0 batchindex ...         gateway@32263[I]:setting up sockets...
        gateway@32263[I]:input tcp://0.0.0.0:54945 (PULL_CONNECT)    output tcp://0.0.0.0:40937 (PUB_BIND)   control over ipc:///tmp/tmp16sxf8kk (PAIR_BIND)
        gateway@32263[I]:prefetching 50 requests...
        gateway@32263[W]:if this takes too long, you may want to take smaller "--prefetch" or ask client to reduce "--batch-size"
        gateway@32263[I]:prefetching 50 requests takes 0.007 secs
        gateway@32263[I]:send: 0 recv: 0 pending: 0

And after cancelling the script:

^C    [183.194 secs]
    ✅ done in ⏱ 183.2s 🐎 0.0/s
  chunk_indexer@32279[W]:user cancel the process
    doc_indexer@32282[W]:user cancel the process
       splitter@32271[W]:user cancel the process
       join_all@32285[W]:user cancel the process
   NumpyIndexer@32279[I]:no update since 2020-05-07 06:47:01, will not save. If you really want to save it, call "touch()" before "save()" to force saving
   DocPbIndexer@32282[I]:no update since 2020-05-07 06:47:01, will not save. If you really want to save it, call "touch()" before "save()" to force saving
       PyClient@32263[W]:user cancel the process
  BasePbIndexer@32279[I]:no update since 2020-05-07 06:47:01, will not save. If you really want to save it, call "touch()" before "save()" to force saving
       splitter@32271[I]:#sent: 0 #recv: 0 sent_size: 0 Bytes recv_size: 0 Bytes
    doc_indexer@32282[I]:executor says there is nothing to save
   ChunkIndexer@32279[I]:no update since 2020-05-07 06:47:01, will not save. If you really want to save it, call "touch()" before "save()" to force saving
  chunk_indexer@32279[I]:dumped changes to the executor, 183s since last the save
    doc_indexer@32282[I]:#sent: 0 #recv: 0 sent_size: 0 Bytes recv_size: 0 Bytes
  chunk_indexer@32279[I]:#sent: 0 #recv: 0 sent_size: 0 Bytes recv_size: 0 Bytes
        encoder@32274[W]:user cancel the process
        encoder@32274[I]:#sent: 0 #recv: 0 sent_size: 0 Bytes recv_size: 0 Bytes
   NumpyIndexer@32279[W]:you can not query from <jina.executors.indexers.vector.numpy.NumpyIndexer object at 0x7f8142dd7dd8> as its "query_handler" is not set. If you are indexing data then that is fine, just means you can not do querying-while-indexing.If you are querying data then the index file must be broken.
       join_all@32285[I]:#sent: 0 #recv: 0 sent_size: 0 Bytes recv_size: 0 Bytes
       PyClient@32263[S]:terminated
       splitter@32271[I]:#sent: 0 #recv: 0 sent_size: 0 Bytes recv_size: 0 Bytes
       splitter@32271[S]:terminated
     GatewayPea@32263[S]:terminated
  chunk_indexer@32279[I]:#sent: 0 #recv: 0 sent_size: 0 Bytes recv_size: 0 Bytes
       join_all@32285[I]:#sent: 0 #recv: 0 sent_size: 0 Bytes recv_size: 0 Bytes
       join_all@32285[S]:terminated
  chunk_indexer@32279[S]:terminated
        encoder@32274[I]:#sent: 0 #recv: 0 sent_size: 0 Bytes recv_size: 0 Bytes
    doc_indexer@32282[I]:#sent: 0 #recv: 0 sent_size: 0 Bytes recv_size: 0 Bytes
        encoder@32274[S]:terminated
    doc_indexer@32282[S]:terminated
           Flow@32263[S]:flow is closed and all resources should be released already, current build level is EMPTY
done
nan-wang commented 4 years ago

hi, pal. We've made a fix at PR-413. Would you please try to install the the devel version and give a shot again? Thanks for posting!