stargate / data-api

JSON document API for Apache Cassandra (formerly known as JSON API)
https://stargate.io
Apache License 2.0
14 stars 16 forks source link

`vector` primary key not returned properly in `insertedIds` #1698

Open toptobes opened 1 week ago

toptobes commented 1 week ago

If I create a table like so:

const table = await db.createTable('t2est_table2', { 
  definition: { columns: { vec: { type: 'vector', dimension: 3 } }, primaryKey: 'vec' }, 
  ifNotExists: true 
});

and insert it either way:

// serializes as number[]
await table.insertOne({ vec: [0.1, 0.2, 0.3] });

// serializes as $binary
await table.insertOne({ vec: new DataAPIVector([0.1, 0.2, 0.3]) });

You can see in this resulting event that the vector was returned simply as { exists: true } for some reason

CommandSucceededEvent {
  command: {
    insertOne: { document: { a: [ 0.1, 0.2, 0.3 ] } } // same result if serialized as $binary
  },
  resp: {
    status: {
      insertedIds: [ [ { empty: false } ] ],
      primaryKeySchema: { blob: { type: 'vector', dimension: 3 } }
    },
    ...
  },
  ...
}
amorton commented 6 days ago

You can see in this resulting event that the vector was returned simply as { exists: true } for some reason

assuming you mean "{ empty: false }"

Repro case:

{
    "createTable": {
        "name": "t2est_table2",
        "definition": {
            "columns": {
                "vec": {
                    "type": "vector",
                    "dimension": 3
                }
            },
            "primaryKey": "vec"
        },
        "options": {
            "ifNotExists": true
        }
    }
}
{
    "insertOne": {
        "document": {
            "vec": [
                0.1,
                0.2,
                0.3
            ]
        }
    }
}
{
    "status": {
        "insertedIds": [
            [
                {
                    "empty": false
                }
            ]
        ],
        "primaryKeySchema": {
            "vec": {
                "type": "vector",
                "dimension": 3
            }
        }
    }
}
amorton commented 6 days ago

we did some digging, this stack below shows the data types and we were not expecting the CqlVector....

ERROR [vert.x-eventloop-thread-1] 2024-11-14 08:25:28,728 ThrowableToErrorMapper.java:280 - Unrecognized Exception (java.lang.ClassCastException) caught, mapped to SERVER_UNHANDLED_ERROR: class com.datastax.oss.driver.api.core.data.CqlVector cannot be cast to class [Ljava.lang.Object; (com.datastax.oss.driver.api.core.data.CqlVector is in unnamed module of loader io.quarkus.bootstrap.classloading.QuarkusClassLoader @5a56cdac; [Ljava.lang.Object; is in module java.base of loader 'bootstrap'): java.lang.ClassCastException: class com.datastax.oss.driver.api.core.data.CqlVector cannot be cast to class [Ljava.lang.Object; (com.datastax.oss.driver.api.core.data.CqlVector is in unnamed module of loader io.quarkus.bootstrap.classloading.QuarkusClassLoader @5a56cdac; [Ljava.lang.Object; is in module java.base of loader 'bootstrap')
    at io.stargate.sgv2.jsonapi.service.operation.InsertAttemptPage.lambda$buildNonPerDocumentResult$1(InsertAttemptPage.java:77)
    at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)

This is an edge case, using a vector as a primary key, we will fix in the december release.

Yuqi-Du commented 19 hours ago

Is this one needed for Dec hot-fix?

Looks like vector as PK is a rare case for now.

tatu-at-datastax commented 19 hours ago

Good question: relatively easy to fix but does not seem like high priority to me. So maybe leave out unless gets escalated.