thorwhalen / oa

Python interface to OpenAi
Apache License 2.0
0 stars 0 forks source link

add dimensions arg to oa.embeddings #9

Closed thorwhalen closed 5 months ago

thorwhalen commented 5 months ago

Sometimes having large embeddings vectors is a problem.

I initially was trying different embeddings models with different sizes, but actually, to reduce size, one needs to use a dimensions parameter when using openai's embeddings API. The way it works is that the model is used, as is, and then the vectors are reduced.

So how are they reduced? Found the answer here. They use Matryoshka Representation Learning for this. As far as I can tell, they say the reduction (to k size) is performed by taking the first k numbers of the vector, but I can't confirm this with my tests.

import oa
import pandas as pd

t = oa.embeddings('hello world', model='text-embedding-3-small')
tt = oa.embeddings('hello world', model='text-embedding-3-small', dimensions=256)
ttt = oa.embeddings('hello world', model='text-embedding-3-small', dimensions=10)

pd.DataFrame(
    {'t': t[:5], 'tt': tt[:5], 'ttt': ttt[:5],}
).map(lambda x: round(x, 4))

image

thorwhalen commented 5 months ago

this commit closes #9 (and added an "extra kwargs" arg as well).