Open justanotherlad opened 1 year ago
To provide some more details and proof of what I'm saying,
1) I initially added the "keywords" as one of the "metadata" s in the file for /upsert
'ing , but when retrieving the most relevant documents, it had no idea what the "metadata" s contained, and as a result returned a bunch of crap results as the first few:
However, as soon as I added the "keywords" to the "text" section instead, it got picked up:
This proves that the embeddings are generated only using the "text" field, and that's what used to retrieve the most relevant documents.
2) However, when I search/ask for something that's not present in the "text" field at all, but only in the "metadata", the LLM can still give contextual answers from the "metadata":
This makes me wonder, how is the LLM searching through the "metadata" sections?
The JSON file that I'm
/upsert
'ing contains data like the following:I'm using Pinecone vectorDB.
However, when I
/query
something like "Laurent expansion" , it returns the following:Note: Here, a part of the "text" field is missing during retrieval (but somehow it intelligently retrieves the part which is most relevant, i.e., the last sentence of "text"). Also, however, the entire "metadata" is returned without any chunking.
I want to know how the "text" and "metadata" fields are chunked when using Pinecone vectorDB, and if the "metadata" fields are not chunked at all, then how are informations retrieved from there, i.e., is there a separate string-search mechanism or something for the "metadata" field, because afaik the embeddings are generated only using "text" field and not "metadata", but still contextual informations can be retrieved by
/query
which are present only in the "metadata" and not in "text".