Closed ArturTanona closed 4 years ago
After a small refactor to put it in line with jina 0.5.0 it works like a charm (almost).
@ArturTan Thanks for trying out jina! What's the status quo on this issue? With the recursive Document
structure, the concept of Chunk
is deprecated in v0.5.0. Here is a guide for migration https://github.com/jina-ai/jina/issues/702.
Yep, it works fine. PS. v0.5.0 is a great release! I really appreciate it.
Describe your problem I am playing with jina 0.4.1, basing partially on urbandict-search configuration. My example is already put in the repo: https://github.com/ArturTan/invalid_flow_of_jina
I included my own data (5 long texts) and it seems that encoder does not receive chunks but the entire text.
I used two custom classes for debugging this problem:
CustomSenticizer
andCustomEncoder
that inherit fromSentencizer
andTransformerTorchEncoder
respectively. They save the input and output date fromcraft
andencode
method respectively.This show:
[input for CustomSenticizer] - string per one invokation of
craft
function. [output for CustomSenticizer] - dict of sentences and meta infos. [input for CustomEncoder] - 5-dim np.array of long strings [output for CustomSenticizer] - array of shape (5, 28996)That means that arrays represent entire texts. But not the chunks. It is contrary to this information from jina docs:
What is your guess? Crafter does not send the chunks but the entire text as the input to the Encoder.
Environment
┆Issue is synchronized with this Jira Task by Unito