Open Arputikos opened 1 day ago
@Arputikos yes, it's caused by document splitter, https://github.com/milvus-io/milvus-haystack/issues/32 can you give us some feedback, like whether there are too many warning logs to affect the viewing of other logs. If so, we will try to discard it. Any suggestion is appreciated
Honestly I think that this field should be converted to a compatible data type, should not be discarded at all. If I create documents from my data and I apply some settings, and serialize this, to the milvus database, then after loading it from it I believe that my data is in the same state, exact, and it turns out that it's not, because it's missing this crucial variable that I might use in my pipeline. I think it should be eg. serialized to string and stored as string if milvus cannot use list type, as it looks like it's a list (because it contains id of the document that overlaps with it, and the overlap range).
I created some documents and adding them to milvus zilliz cloud. Got warnings: Document f8a733afe670032c5f438ba53b7e0d64d4a405c8ad1751674eb72bbd7c6c12c5 has metadata fields with unsupported types: ['_split_overlap']. Supported types refer to Pymilvus DataType. The values of these fields will be discarded.
even though _split_overlap is not something that I added - it's from document splitter I guess. My pipeline is following: DocumentCleaner DocumentSplitter DocumentEmbedder DocumentWriter