Closed a-h closed 1 month ago
Not saying rqlite is doing the right thing, but the first thing I did was try it out directly in SQLite. This what I see:
$ sqlite3
SQLite version 3.39.4 2022-09-29 15:55:41
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .load 'vec0.so'
sqlite> create virtual table embeddings using vec0(id integer primary key, embedding float[8]);
sqlite> insert into embeddings (embedding) values ('[-0.200, 0.250, 0.341, -0.211, 0.645, 0.935, -0.316, -0.924]');
sqlite> insert into embeddings (embedding) values ('[-0.200, 0.250, 0.341, -0.211, 0.645, 0.935, -0.316, -0.924]');
sqlite> insert into embeddings (embedding) values ('[-0.200, 0.250, 0.341, -0.211, 0.645, 0.935, -0.316, -0.924]');
sqlite> insert into embeddings (embedding) values ('[-0.200, 0.250, 0.341, -0.211, 0.645, 0.935, -0.316, -0.924]');
sqlite> select * FROM embeddings e where e.id = 4;
4|��L�
Can you show me SQLite itself doing what you expect? If SQLite itself isn't operating the way you expect, is it a more fundamental issue with sqlite-vec and its interaction with SQLite?
There may be multiple issues going on here, but it would be helpful to show me the behaviour you expect to see when using SQLite directly. That would help me very much figure out what is going. I do notice that rqlite is reporting that the type of the returned data to be 'text', could be something wrong there.
But again, showing me sqlite-vec working that way you expect when using SQLite directly would be very helpful.
Ah, for some reason, I didn't think to test SQLite directly. Sorry to waste your time on the investigation.
My expectation was that I'd get the vector back, but I guess I don't really need it back, and if I did, I could store it twice, perhaps in a blob column.
I'll ask the sql-vec folks. It might be expected not to be able to return the embedding, or it might be a bug.
I'm keen to make sure sqlite-vec (and similar) are well supported in rqlite. rqlite's implementation may need some work to support whatever sqlite-vec returns (assuming there are no issues in sqlite-vec -- that's what you can confirm for me).
@asg017 -- any comment? What do you think rqlite should return in this scenario? See https://github.com/rqlite/rqlite/issues/1858#issue-2472060099
No activity, closing.
Hey sorry missed this — this is expected behavior. Vectors inserted into vec0
virtual tables will automatically be converted to a compact BLOB format to save space. Storing vectors as JSON always bloats index sizes by a lot.
If you want the original JSON back you can use vec_to_json()
on the BLOB to get the JSON text representation of the vector.
Again this only happens when you store vectors in vec0
virtual tables. If you store your vectors inside a regular table as JSON, then you it won't do any implicit conversions. Though I'd say to always store them as the compact BLOB format (use vec_f32()
for JSON->BLOB) as it is much faster and smaller.
Thanks @asg017
@a-h -- if you're finding the rqlite API is still preventing you from getting what you need, let me know. Sometimes the interactions between the HTTP API and the underlying SQLite API can be awkward to handle. You may wish to study this: https://rqlite.io/docs/api/api/#blob-data
(As in I'll consider non-breaking changes to the API to make working with extensions like sqlite-vec easier to do).
Thanks, my main concern was my ability to test whether the data had landed in the database correctly, i.e. that there weren't any problems with encoding etc.
I was able to test it successfully thanks to the pointer of vec_to_json
with the following code:
log.Info("Connecting to database")
conn, err := gorqlite.Open(databaseURL.DataSourceName())
if err != nil {
log.Error("failed to open connection", slog.Any("error", err))
os.Exit(1)
}
defer conn.Close()
queries := db.New(conn)
if err = queries.VecTestInsert(ctx, 1, []float64{1.1, 2.2, 3.3, 4.4}); err != nil {
log.Error("failed to insert vec_test", slog.Any("error", err))
return err
}
floats, err := queries.VecTestSelect(ctx, 1)
if err != nil {
log.Error("failed to select vec_test", slog.Any("error", err))
return err
}
log.Info("vec_test results", slog.Any("floats", floats))
return nil
create virtual table vec_test using vec0(embedding float[4]);
func (q *Queries) VecTestInsert(ctx context.Context, rowid int64, values []float64) (err error) {
if len(values) != 4 {
return fmt.Errorf("expected 4 values, got %d", len(values))
}
jsonValues, err := json.Marshal(values)
if err != nil {
return fmt.Errorf("failed to marshal values: %w", err)
}
stmt := gorqlite.ParameterizedStatement{
Query: `insert into vec_test (rowid, embedding) values (?, ?)`,
Arguments: []any{rowid, string(jsonValues)},
}
_, err = q.conn.WriteOneParameterizedContext(ctx, stmt)
if err != nil {
return fmt.Errorf("failed to insert values: %w", err)
}
return nil
}
func (q *Queries) VecTestSelect(ctx context.Context, rowid int64) (values []float64, err error) {
stmt := gorqlite.ParameterizedStatement{
Query: `select vec_to_json(embedding) from vec_test where rowid = ?`,
Arguments: []any{rowid},
}
results, err := q.conn.QueryOneParameterizedContext(ctx, stmt)
if err != nil {
return values, fmt.Errorf("failed to select: %w", err)
}
for results.Next() {
var jsonValues string
err := results.Scan(&jsonValues)
if err != nil {
return values, fmt.Errorf("failed to scan: %w", err)
}
err = json.Unmarshal([]byte(jsonValues), &values)
if err != nil {
return values, fmt.Errorf("failed to unmarshal: %w", err)
}
}
return values, nil
}
The end result was as expected:
{"time":"2024-09-16T07:52:09.943417+01:00","level":"INFO","msg":"vec_test results","floats":[1.1,2.2,3.3,4.4]}
Just leaving this in the comments for anyone that is searching for help...
If you try and pass []float64
to rqlite directly, you'll get 400 Bad Request, message: unsupported type
because []float64
is not a directly supported type.
// Don't do this, use the working code above...
func (q *Queries) VecTestInsert(ctx context.Context, rowid int64, values []float64) (err error) {
if len(values) != 4 {
return fmt.Errorf("expected 4 values, got %d", len(values))
}
stmt := gorqlite.ParameterizedStatement{
Query: `insert into vec_test (rowid, embedding) values (?, ?)`,
Arguments: []any{rowid, values},
}
_, err = q.conn.WriteOneParameterizedContext(ctx, stmt)
if err != nil {
return fmt.Errorf("failed to insert values: %w", err)
}
return nil
}
{"time":"2024-09-16T07:55:43.143509+01:00","level":"ERROR","msg":"failed to insert vec_test","error":"failed to insert values: tried all peers unsuccessfully. here are the results:\n peer #0: http://admin:xxxxx@localhost:4001/db/execute?timings&level=weak&transaction failed, got: 400 Bad Request, message: unsupported type\n\n"}
The fix is, as I've done, to JSON marshal the inputs and pass them that way.
Is your feature request related to a problem? Please describe.
I've tried out https://github.com/asg017/sqlite-vec with rqlite. I'm able to create a vector table, and add embeddings using the
rqlite
CLI tool:However, the inserted values don't seem to be returned as expected. I expected a string containing JSON, or a JSON object, but got this:
In the terminal it looks like this:
I took a look at what I'm getting back over HTTP, and it looks incorrect there too.
Just to check, I tried to get float64 values from the string, but the data isn't correct: https://go.dev/play/p/MPuCSlfK9Yn since the first number isn't -0.200 it's coming out as 1.1862007982529639e+137
Describe the solution you'd like
I'd like the embedding column to be a string containing JSON, or a byte array containing the raw data that I can decode myself.
Describe alternatives you've considered
I haven't considered any alternatives.
Additional context
I've got a repo where I'm testing making a vector document search with rqlite here: https://github.com/a-h/rqlite-test