stargate / data-api

JSON document API for Apache Cassandra (formerly known as JSON API)
https://stargate.io
Apache License 2.0
14 stars 16 forks source link

Binary Vector Support - Fix `$binary` to Support 4096 Vector Dimensions #1710

Open Hazel-Datastax opened 1 week ago

Hazel-Datastax commented 1 week ago

When users encode a 4096-dimensional vector into Base64, the resulting binary string length reaches 21848 bytes. This exceeds the maximum allowable string length of 8000 bytes for the $binary field, resulting in the following error:

astrapy.exceptions.DataAPIResponseException: Document size limitation violated: indexed String value (property '$binary') length (21848 bytes) exceeds maximum allowed (8000 bytes)
tatu-at-datastax commented 3 days ago

Quick question: is this when updating $vector field of a Collection? (I assume so)

I don't think we really can increase indexable limit (it's a SAI setting limited by database). But perhaps we can instead make sure that $vector is, by default, not indexed as a String by Data API -- it will be ANN indexed anyway so there seems to be little need.