manifold-systems / manifold

Manifold is a Java compiler plugin, its features include Metaprogramming, Properties, Extension Methods, Operator Overloading, Templates, a Preprocessor, and more.
http://manifold.systems/
Apache License 2.0
2.42k stars 125 forks source link

Unrecognized type should be Object not String #595

Open oldshensheep opened 6 months ago

oldshensheep commented 6 months ago

Describe the bug

To Reproduce install pgvector for postgres https://github.com/pgvector/pgvector create table

create table if not exists document_embeddings
(
    id         serial8 primary key,
    embedding  vector(2) not null,
    content    text         not null,
    metadata   jsonb        not null,
    created_at timestamp    not null default now()
);

run code

My.DocumentEmbeddingsTest.builder("[0.2,0.3]", "abc", "{}").build();

or

    My.addSqlChange(ctx -> {
      "[.sql/] insert into document_embeddings_test(embedding, content, metadata) values (?,?,?::jsonb)".execute(
          ctx, "[0.2,0.3]", "abc", "{}");
    });
Exception in thread "main" org.postgresql.util.PSQLException: ERROR: column "embedding" is of type vector but expression is of type character varying
  Hint: You will need to rewrite or cast the expression.

Expected behavior There is a workaround like jsonb, We pass String and cast it to vector

    My.addSqlChange(ctx -> {
      "[.sql/] insert into document_embeddings_test(embedding, content, metadata) values (?::vector,?,?::jsonb)".execute(
          ctx, "[0.2,0.3]", "abc", "{}");
    });

or better without casting (not possible for now)

    My.addSqlChange(ctx -> {
      "[.sql/] insert into document_embeddings_test(embedding, content, metadata) values (?,?,?::jsonb)".execute(
          ctx, new float[]{0.1F, 0.2F}, "abc", "{}");
    });

use raw jdbc

      pstmt.setObject(1, new float[]{0.1F, 0.2F});
// https://github.com/pgvector/pgvector-java
      PGvector pGvector = new PGvector(new float[]{0.1F, 0.2F});
      pstmt.setObject(1, pGvector);

The Problem Converting a String to a vector works for small vectors, but often we use vectors of size 1024 or larger. Thus, we need to convert a float [1024] to a String, pass it to the database, and then the database must convert the String back to a vector, which is time-consuming.

Manually maintaining mappings from non-JDBC types to Java types can be an endless task. It's better to allow users to implement these mappings. What I propose is that an unrecognized type should default to Object, not String, as converting a String to another type can be a performance issue. Additionally, there should be a way for users to map these types themselves.

Desktop (please complete the following information):

Additional context I understand that ValueAccessor is for mapping, and I want to implement it myself.

https://github.com/manifold-systems/manifold/blob/e94f5ae8cdb87c9d901da2352a97a13a81c9f9d0/manifold-deps-parent/manifold-sql-rt/src/main/java/manifold/sql/rt/impl/accessors/OtherValueAccessor.java#L37-L40 it invoke getColumnClassName to get java type, postgresql jdbc implement this method, is this the problem of postgresql jdbc?

https://github.com/pgjdbc/pgjdbc/blob/450488c142fdc368cab54e8257407603acc18c4f/pgjdbc/src/main/java/org/postgresql/jdbc/PgResultSetMetaData.java#L440

for some reason I can't step into getColumnClassName while debugging.

Stack trace Please include a stack trace if applicable

rsmckinney commented 5 months ago

IIRC postgresql JDBC driver defaults to String for many special types. Using IntelliJ debugger you can "force" step into the driver code, it will be decompiled.

Anyhow, I'm going to make this work in a couple of ways. First, if the JDBC driver supports it, I'll make Object work as the Java type. Additionally, if pgvector-java is in use, I'll make sure PGVector is the Java type and that it is integrated.