Closed DrEight closed 9 months ago
@DrEight, thank you for you interest.
Yes I did it using this approach. However, since the connector is a little bit different, I had do adapt it but the similarity search is exactly the same. All the tables, indexes and FK are created directly by the connector, you don't have to deal with it.
Do you have the schema to share, or can you add this information to the documentation please?
@DrEight please find below the diagram of the Kernel Memory tables:
Note: This was extracted while creating the collection called "test_collection", that means that you might have multiple collections, each collections will create following tables:
- SKEmbeddings
- SKMemoriesTags
It this what you are looking for?
yes, kind of. Since I'm not happy with the performance, I tried many approaches and I was curious to see if you find any why to improve the performance. So far the best results I had is to implement a kNN algorithm to partition the vector table. Another improvement that I'm working on, is to add a hybrid search, since that we have the text and we can 'full text search'. Then merge the 2 results.
Thoughts ?
yes, kind of. Since I'm not happy with the performance, I tried many approaches and I was curious to see if you find any why to improve the performance. So far the best results I had is to implement a kNN algorithm to partition the vector table. Another improvement that I'm working on, is to add a hybrid search, since that we have the text and we can 'full text search'. Then merge the 2 results.
Thoughts ?
I'm curious about what are your issues regarding the performances? are their about technical performance (time for retrieval, cpu, storage, ...) or are you talking about functional performances (context precision, context recall, ...)?
Solutions might differ regarding what you are meaning.
However, regarding the technical performances, david told about some technics that might be tested (partitioning the embeddings, the vector, ....) since this require more SQL to realize that, I didn't had time to deal with it. But when working with Retrieval Augmentation, I currently didn't have to deeply work on that question honestly.
I'm talking about the performance about finding the cosine similarities. In the original example the vector is a float vector, and on an Azure SQL server database with 200DTU assigned, I get this: CPU Time = 5406 ms, elapsed time = 2829 ms. I created a parallel table using 'reals' instead of 'floats', and the performance I get is: CPU Time = 1641 ms, elapsed time = 883 ms. A huge difference. I tried also to 'quantitaze' the float to an int16, but unexpectedly the performance is : CPU time = 5765 ms, elapsed time = 2972 ms even if the table is the smaller table of my tests. I tried to kind of partition the vector table, creating kNN, but I have conflicting results, so I'm not sure.
From the diagram you shared in this chat I can't tell if the vectors are stored in floats, like the original blog or in real. If they are float I strongly suggest you to use reals.
I'm talking about the performance about finding the cosine similarities. In the original example the vector is a float vector, and on an Azure SQL server database with 200DTU assigned, I get this: CPU Time = 5406 ms, elapsed time = 2829 ms. I created a parallel table using 'reals' instead of 'floats', and the performance I get is: CPU Time = 1641 ms, elapsed time = 883 ms. A huge difference. I tried also to 'quantitaze' the float to an int16, but unexpectedly the performance is : CPU time = 5765 ms, elapsed time = 2972 ms even if the table is the smaller table of my tests. I tried to kind of partition the vector table, creating kNN, but I have conflicting results, so I'm not sure.
From the diagram you shared in this chat I can't tell if the vectors are stored in floats, like the original blog or in real. If they are float I strongly suggest you to use reals.
Hi,
I'm not particular expert on SQL server but as fare I know It should be any difference between float and real (since real is a float(24) in 4 bytes too). Yes my implementation is also using float to store the vector value, maybe an improvement to do on my side, but I need more details about your benchmark, is it reproduceable?
How many embeddings you have (are you using the sample from the blog above?).
Column store performance is strongly dependent from the size of the table or the partition. So if the size of the fable decreases, you could have a benefit in performance. In my case the size of the table with float is 482MB and the same table with the 'real' is 184MB. The size of the table with 'smallint' is 136MB, but I have bad performance. I suspect that the CPU matters as well. The modern CPUs are optimised for 32/64 data types.
So I suggest to change the data type of the vector, from float to real, and see if there is an increase of performance or not.
@kbeaugrand I was just about the develop the same connector. Thank you for your work. I took a look of the code, but I didn't see the SQL schema. Can you help me to find it ?
Did you follow the same approach suggested here : https://devblogs.microsoft.com/azure-sql/vector-similarity-search-with-azure-sql-database-and-openai/ ?