add dimensions arg to oa.embeddings

Sometimes having large embeddings vectors is a problem.

I initially was trying different embeddings models with different sizes, but actually, to reduce size, one needs to use a dimensions parameter when using openai's embeddings API. The way it works is that the model is used, as is, and then the vectors are reduced.

So how are they reduced? Found the answer here. They use Matryoshka Representation Learning for this. As far as I can tell, they say the reduction (to k size) is performed by taking the first k numbers of the vector, but I can't confirm this with my tests.

import oa
import pandas as pd

t = oa.embeddings('hello world', model='text-embedding-3-small')
tt = oa.embeddings('hello world', model='text-embedding-3-small', dimensions=256)
ttt = oa.embeddings('hello world', model='text-embedding-3-small', dimensions=10)

pd.DataFrame(
    {'t': t[:5], 'tt': tt[:5], 'ttt': ttt[:5],}
).map(lambda x: round(x, 4))

thorwhalen / oa

add dimensions arg to oa.embeddings #9