snowflakedb / snowflake-sqlalchemy

Snowflake SQLAlchemy
https://pypi.python.org/pypi/snowflake-sqlalchemy/
Apache License 2.0
233 stars 152 forks source link

SNOW-1374015: 🐛 Snowflake SQLAlchemy Driver fails when reflecting new `VECTOR` data type #499

Open aaronsteers opened 6 months ago

aaronsteers commented 6 months ago

Symptom trying to fix?

When reflecting a SQL table, failures will raise if using the (very new!) VECTOR data type.

What did you expect to see?

I think this driver needs to be updated to handle VECTOR type. Internally, this is basically an array of floats, except that the number of items in the array is fixed at creation time.

sfc-gh-dszmolka commented 6 months ago

hello and thank you for the interest of the public preview feature of VECTOR datatype! as documented on the feature page, currently it is

[..]only supported in SQL, the Python connector and the Snowpark Python library. No other languages are supported.

We'll work on adding support in other Snowflake drivers and connectors and thank you for bearing with us while this happens.

aaronsteers commented 6 months ago

@sfc-gh-dszmolka - Yes, that makes sense. Our workaround for now is to pass certain commands through the Snowflake Python client - but it would be beneficial long-term to switch back to the native SQLAlchemy integrations - and also to at least make sure SQLAlchemy does not break when attempting to scan or read from those tables.

Happy to use this issue as a tracking item for that future work. Thanks for your support.

japborst commented 2 months ago

Hey @sfc-gh-dszmolka. As this feature is now out of public preview, do you know the status of this issue? Thanks!

sfc-gh-dszmolka commented 2 months ago

hi @japborst unfortunately at this moment, I don't have any additional info on the timeline for the implementation, but trying to get it from the team and will update this issue when/if I have any news. Thank you all for bearing with us !

japborst commented 2 months ago

Hey @sfc-gh-dszmolka!

On the website I read

The VECTOR data type is only supported in SQL, the Python connector and the Snowpark Python library. No other languages are supported.

Do I read correctly that whilst there is Python support (the Python snowflake connector), it's primarily SQLAlchemy support that is missing?

tazhigaliyev commented 1 month ago

for those who need vector dt in sqlalchemy, this temp workaround (there's nothing more permanent than temporary) might help:

from sqlalchemy.types import UserDefinedType

class SFVector(UserDefinedType):
    def __init__(self, data_type, length):
        self.data_type = data_type
        self.length = length

    def get_col_spec(self):
        return f"VECTOR({self.data_type}, {self.length})"

embedding = Column(SFVector('FLOAT', 1536), nullable=True)

MattLJoslin commented 1 month ago

Is there any ETA on this? It makes this quite hard to connect to sqlalchemy based solutions such as SuperSet.

aaronsteers commented 1 month ago

for those who need vector dt in sqlalchemy, this temp workaround (there's nothing more permanent than temporary) might help:

from sqlalchemy.types import UserDefinedType

class SFVector(UserDefinedType):
    def __init__(self, data_type, length):
        self.data_type = data_type
        self.length = length

    def get_col_spec(self):
        return f"VECTOR({self.data_type}, {self.length})"

embedding = Column(SFVector('FLOAT', 1536), nullable=True)

FWIW: I've used a similar workaround, although it won't help when SQLAlchemy is wrapped by another tool (like Superset in the comment above). Would be great to see native support added.