knex / knex

A query builder for PostgreSQL, MySQL, CockroachDB, SQL Server, SQLite3 and Oracle, designed to be flexible, portable, and fun to use.
https://knexjs.org/
MIT License
18.98k stars 2.11k forks source link

Support for pgvector? #5730

Open uninstallit opened 8 months ago

uninstallit commented 8 months ago

Environment

Node.js

Knex version: 3.0.1 Database + version: PostgreSQL 11+ OS: N/A

Feature discussion / request

pgvector is a PostgreSQL extension that allows you to store, query and index vectors. PostgreSQL does not have vector capabilities natively (yet, as of PostgreSQL 16) and pgvector is designed to close this gap.

  1. Explain what is your use case

Our use case involves working with high-dimensional data, particularly in the field of machine learning and data analytics. We store feature vectors extracted from various data sources like images, text, or audio in our PostgreSQL database. These vectors are crucial for performing similarity searches, recommendations, and clustering operations. Currently, PostgreSQL does not natively support efficient vector operations, and thus we are utilizing the pgvector extension to fill this gap.

The ability to perform vector operations directly within the database is vital for the speed and efficiency of our data processing workflows. However, our application is built on Node.js and uses Knex.js as an SQL query builder. At present, Knex does not natively support the syntax or methods provided by pgvector, which leads to a disjointed and inefficient database interaction layer. We are manually writing raw queries for vector operations, which is error-prone and deviates from the clean abstraction provided by Knex for other types of queries.

  1. Explain what kind of feature would support this

The feature we are requesting is that Knex understands and integrates pgvector capabilities. This would include:

  1. Give some API proposal, how the feature should work

Creating a table with a vector column:

knex.schema.createTable('items', table => {
  table.increments();
  table.vector('embedding', { dimensions: 3 }); // New method for vector type
});

Inserting or updating a vector:

knex('items').insert({ id: 1, embedding: knex.pgvector([1, 2, 3]) }) // New method pgvector to handle array-to-vector conversion
  .onConflict('id')
  .merge({ embedding: knex.pgvector([1, 2, 3]) }); // Using merge as an alias for upsert

knex('items').where({ id: 1 }).update({ embedding: knex.pgvector([1, 2, 3]) });

Querying for nearest neighbors:

knex('items')
  .select('*')
  .whereRaw('? <-> embedding', [knex.pgvector([1, 2, 3])]) // Using whereRaw until a better abstraction is available
  .orderBy('embedding', 'vector_distance') // New orderBy option 'vector_distance'
  .limit(10);

This API design would allow Knex users to easily manipulate and query vector data with the pgvector extension while maintaining the clean and familiarKnex interface

ankane commented 7 months ago

Hey @uninstallit, added instructions to pgvector-node for the current state (just fyi).