Open hemidactylus opened 1 week ago
Thanks, Stefano.
For this error part, we certainly needs better error, and not 500.
Server failed: root cause: (java.lang.IllegalArgumentException) Must be single
embedding provider name, got [openai, jinaAI]. Server error '500 Internal Server
Error' for url 'https://[...]-us-west-2.apps.astra-dev.datastax.com/[...]/<TABLE>
Another problem is Data API only supports vectorize multiple fields with same provider and dimension. It fails when user don't do so. In the meantime, we allow users to create table with columns have different vectorize settings. Need discussion for this one.
dec hot fix : better errors, including not supporting the diff dimensions. Split this ticket when we get to work on it. January fix to handle different providers, model, and dimensions in the same table.
This is mostly a survey and a note to start future work, I guess. I tried several wicked things with vectorize using 1.0.20 on a dev DB and more than a single vectorize column. (Note: this cannot be tested completely on local Data API because the two-provider case requires usage of KMS, lacking a way to send multiple embedding API keys via header)
Two columns with same provider, model and dimension
Inserting a row with
{v1: "blabla", v2: [...]}
(i.e. one string and one vector) worksBoth vectors: works
Both strings: works (two different embedding vectors got stored for two different texts, as expected)
Two columns with same provider and model, but different dimension
Passing both as strings does not work: the API mistakenly thinks both vectors are the same dim:
Two columns with same provider, different models, different dimensions
Same error as above (and I suspect at this point even if the dimension did match it would either error or work in the wrong way)
Two columns with different providers
This time, inserting both as strings leads to a 500 Internal Server Error:
If one of the two is passed as a vector, instead, and the other is a string, depending on which one two things happen: