sqlc-dev / sqlc

Generate type-safe code from SQL
https://sqlc.dev
MIT License
13.58k stars 812 forks source link

pgvector & :copyfrom command not working #3655

Open embiem opened 1 month ago

embiem commented 1 month ago

Version

1.27.0

What happened?

The pgvector support introduced in v1.23.0 doesn't seem to play nicely with the :copyfrom command. I get the following error with the setup described in this issue: ERROR: vector cannot have more than 16000 dimensions (SQLSTATE 54000). Even though the inserted pgvector has the dimension of 1024, same as the column.

When I use the following query using :exec with that same vector, then it works without problems:

-- name: CreateEmbedding :exec
INSERT INTO "embeddings"."embeddings_1024" (node_id, embedding)
VALUES (
  $1, $2
);

Relevant log output

ERROR: vector cannot have more than 16000 dimensions (SQLSTATE 54000)

Database schema

CREATE EXTENSION IF NOT EXISTS vector;

CREATE SCHEMA IF NOT EXISTS "embeddings";

CREATE TABLE "embeddings"."embeddings_1024" (
    node_id BIGINT PRIMARY KEY,
    embedding VECTOR(1024)
);

SQL queries

-- name: CreateEmbeddings :copyfrom
INSERT INTO "embeddings"."embeddings_1024" (node_id, embedding)
VALUES (
  $1, $2
);

Configuration

version: "2"
sql:
  - engine: "postgresql"
    queries: "db/query.sql"
    schema: "db/migrations"
    gen:
      go:
        package: "data"
        out: "data"
        sql_package: "pgx/v5"

Playground URL

can't use pgvector

Alternatively, here is the docker-compose.yml I use for local dev to have pgvector installed:

version: "3.9"

services:
  db:
    image: pgvector/pgvector:pg16
    restart: on-failure
    volumes:
      - db_data:/var/lib/postgresql/data
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: password
    ports:
      - "5432:5432"

  adminer:
    image: ghcr.io/shyim/adminerevo:latest
    restart: on-failure
    ports:
      - 8080:8080
volumes:
  db_data:

What operating system are you using?

Linux

What database engines are you using?

PostgreSQL

What type of code are you generating?

Go

Workaround

The workaround I'm using to still insert many vectors at once is the :batchexec command, which works fine with pgvector.