lanterndata / lantern

PostgreSQL vector database extension for building AI applications
https://lantern.dev
GNU Affero General Public License v3.0
790 stars 57 forks source link

Difficulties with new user experience on docker #140

Closed jjtolton closed 10 months ago

jjtolton commented 1 year ago
$ docker run -dit --rm  -p 5432:5432 -e 'POSTGRES_PASSWORD=postgres' lanterndata/lantern:latest-pg15
cc6e330dca087bef8370d21ee4925f79b76d05ebc28d7a47840360d7f63a1cf3
 $ psql -h localhost -p 5432 -U postgres -d postgres
Password for user postgres: 
psql (10.23 (Ubuntu 10.23-0ubuntu0.18.04.2), server 15.4 (Debian 15.4-1.pgdg120+1))
WARNING: psql major version 10, server major version 15.
         Some psql features might not work.
Type "help" for help.

postgres=# CREATE EXTENSION lantern;
CREATE EXTENSION
postgres=# CREATE TABLE small_world (id integer, vector real[3]);
CREATE TABLE
postgres=# INSERT INTO small_world (id, vector) VALUES (0, '{0,0,0}'), (1, '{0,0,1}');
INSERT 0 2
postgres=# CREATE INDEX ON small_world USING hnsw (vector);
INFO:  done init usearch index
INFO:  inserted 2 elements
INFO:  done saving 2 vectors
CREATE INDEX
postgres=# CREATE INDEX ON small_world USING hnsw (vector dist_l2sq_ops)
postgres-# WITH (M=2, ef_construction=10, ef=4, dims=3);
ERROR:  unrecognized parameter "dims"
postgres=# CREATE INDEX ON small_world USING hnsw (vector dist_l2sq_ops)
WITH (M=2, ef_construction=10, ef=4, dim=3);
INFO:  done init usearch index
INFO:  inserted 2 elements
INFO:  done saving 2 vectors
CREATE INDEX
postgres=# SELECT id, l2sq_dist(vector, ARRAY[0,0,0]) AS dist
postgres-# FROM small_world ORDER BY vector <-> ARRAY[0,0,0] LIMIT 1;
ERROR:  Operator <-> has no standalone meaning and is reserved for use in vector index lookups only
postgres=# \q

I was able to get it working by changing dims to dim, and omitting the <-> ... operator and subsequent clause, but I'm not sure how important these are for the overall workflow. Thanks for this, I was struggling getting the sqllite extension and pgvector extensions running within a spike efficient timeframe.

dqii commented 1 year ago

Thanks for the report! I'll fix these issues in the README.

Re: the second issue - the index doesn't always get triggered, in particular for small tables. If you run SET enable_seqscan = off; that will ensure that the index is triggered in this case. (You can run SET enable_seqscan = on; to undo this action. Alternatively, if you do not expect the index to get triggered, you can sort by l2sq_dist(vector, array[0,0,0])

We have a note on the operator use in the README for more details.

raoufchebri commented 1 year ago

Facing the same issue: Operator <-> has no standalone meaning and is reserved for use in vector index lookups only

dqii commented 1 year ago

@raoufchebri I just edited the README to clarify --- for small tables, you will need to set enable_seqscan to off, since the index doesn't always get triggered. Please let me know if that helps!

dqii commented 10 months ago

We changed this so that the operator is always used regardless of whether or not the index is created, similar to pgvector, so this should no longer be an issue!