Open Jacky56 opened 1 month ago
In my case, pgvector needs the vector to be a string such as '[1,2,3]'
but in the python code it should be accessed as a iterable.
In my case, pgvector needs the vector to be a string such as
'[1,2,3]'
but in the python code it should be accessed as a iterable.
@Jacky56 I don't know if this helps, but you can use the json
module to transform '[1,2,3]'
into an iterable (list
) like this
import json
embeddings = '[1,2,3]'
results = json.loads(embeddings)
print(type(results)) # <class 'list'>
You can also look in this discussion. If Piccolo doesn't support a feature, you can always just use raw sql to get the desired result. Hope this helps.
hello @sinisaos
the suggested discussion is what I'm currently doing, but constantly using Table.raw
to resolve most uncovered query needs kinda defeats the purpose of the ORM existing.
When I grab an object from the database, the python data structure is as shows:
obj = Table.objects().first()
obj.embedding # piccolo resolves this as a string when converting the custom column
Since its a custom column, therefore the dev should have the ability to edit the behaviour on how the column is serialised to python and deserialised as a sql statement.
which that I suspect a behaviour such as:
obj = Table.objects().first()
obj.embedding # piccolo resolves this as `np.ndarray` as the developer specified how it should be serialised from the defined method on `Column`
@Jacky56 I don't know if this helps, but you can use the methods of the table class to serialize/deserialize
. Try something like this for serialization/deserialization
data.
Sorry if I missed your point and don't understand your needs.
Thank you for your reply @sinisaos but its not quite whats desired.
The thing I am describing actually exist in the code here Column.get_sql_value, This translate the python type to the sql equivalent
as we can see here for example, it translates the python list
to the psql array syntax
.
It would be nice to somehow override this method.
With the rise of storing
vectors
and very unique data structures, the base columns frompiccolo.columns
is simply not good enough to express these datatypes for our DBs.In my example I am creating a custom column type called
Vector
:Now every time I try to save the vector, I must serialise it as such:
Now because
Table.vector = a_serialiser_method(Table.vector)
creates side effects, I must unserialise it to the python type.This also applies to fetching the vector, I must deserialise it to the python equalivent:
Question
Is there a way to simply add a default serialising/deserialising behaviour for custom columns?
Suggestion
extend
Column
so that we add default serialiser/deserialiser methods to translate python types to db types and vice versa such as:and these
serializer
anddeserializer
are pre hooks when calling methods such as.todict()
or.save()
/writing to db. These methods should also not generate side effects.