pgvector / pgvector-python

pgvector support for Python
MIT License
979 stars 63 forks source link

Loss of Precision in Embeddings Due to np.float32 in pgvector #97

Closed miskibin closed 1 month ago

miskibin commented 1 month ago

In the pgvector\django\vector.py module, vectors are stored as np.float32:

def to_python(self, value):
    if isinstance(value, list):
        return np.array(value, dtype=np.float32)
    return Vector._from_db(value)

In some cases, critical information about the embeddings is lost.

In the image, red represents the original embedding and green represents the embedding after being parsed by this function:

image

Could this be made parametrizable or customizable in some way?

ankane commented 1 month ago

Hi @miskibin, the vector type uses single-precision floats. You can use different data types to store values with more precision.