Environment

Node.js

Knex version: 3.0.1 Database + version: PostgreSQL 11+ OS: N/A

Feature discussion / request

pgvector is a PostgreSQL extension that allows you to store, query and index vectors. PostgreSQL does not have vector capabilities natively (yet, as of PostgreSQL 16) and pgvector is designed to close this gap.

Explain what is your use case

Our use case involves working with high-dimensional data, particularly in the field of machine learning and data analytics. We store feature vectors extracted from various data sources like images, text, or audio in our PostgreSQL database. These vectors are crucial for performing similarity searches, recommendations, and clustering operations. Currently, PostgreSQL does not natively support efficient vector operations, and thus we are utilizing the pgvector extension to fill this gap.

The ability to perform vector operations directly within the database is vital for the speed and efficiency of our data processing workflows. However, our application is built on Node.js and uses Knex.js as an SQL query builder. At present, Knex does not natively support the syntax or methods provided by pgvector, which leads to a disjointed and inefficient database interaction layer. We are manually writing raw queries for vector operations, which is error-prone and deviates from the clean abstraction provided by Knex for other types of queries.

Explain what kind of feature would support this

The feature we are requesting is that Knex understands and integrates pgvector capabilities. This would include:

Methods to create and manage columns that store vectors.
Query building methods to insert and update vectors in a manner consistent with Knex's syntax.
Support for vector-specific operations like nearest-neighbor searches using pgvector indexes.
This integration should abstract away the raw SQL required to interact with vectors, allowing developers to continue using Knex's chainable methods and familiar interface.

Give some API proposal, how the feature should work

Creating a table with a vector column:

knex.schema.createTable('items', table => {
  table.increments();
  table.vector('embedding', { dimensions: 3 }); // New method for vector type
});

Inserting or updating a vector:

knex('items').insert({ id: 1, embedding: knex.pgvector([1, 2, 3]) }) // New method pgvector to handle array-to-vector conversion
  .onConflict('id')
  .merge({ embedding: knex.pgvector([1, 2, 3]) }); // Using merge as an alias for upsert

knex('items').where({ id: 1 }).update({ embedding: knex.pgvector([1, 2, 3]) });

Querying for nearest neighbors:

knex('items')
  .select('*')
  .whereRaw('? <-> embedding', [knex.pgvector([1, 2, 3])]) // Using whereRaw until a better abstraction is available
  .orderBy('embedding', 'vector_distance') // New orderBy option 'vector_distance'
  .limit(10);

This API design would allow Knex users to easily manipulate and query vector data with the pgvector extension while maintaining the clean and familiarKnex interface

knex / knex

Support for pgvector? #5730

Environment

Feature discussion / request