theodo-group / LLPhant

LLPhant - A comprehensive PHP Generative AI Framework using OpenAI GPT 4. Inspired by Langchain
MIT License
675 stars 62 forks source link

Support user-defined embedding dimensions in Generators #111

Closed bernard-ng closed 4 days ago

bernard-ng commented 2 months ago

https://platform.openai.com/docs/api-reference/embeddings/create#embeddings-create-dimensions

dimensions integer Optional The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.

It is possible to define the dimension of an embedding via the configuration of the service used, for openai via “dimensions” in the model options. Defining the dimension of embeddings can be interesting if you want to interchange different embedding generators (eg: research, benchmarking, etc...).

Currently, the size is defined statically:

final class OpenAI3SmallEmbeddingGenerator extends AbstractOpenAIEmbeddingGenerator
{
    public function getEmbeddingLength(): int
    {
        return 1536;
    }
// ...
}

I think this function should take into account customizations made by the user, or add a function to define the embedding dimension from the generator

public function getDefaultEmbeddingLength(): int;

// default unless user-defined 
public function getEmbeddingLength(): int;

// alter config to add “dimensions” option
// $dimension should be >= default embedding length 
public function setEmbeddingLength(int $dimension): void;

What do you think?

MaximeThoonsen commented 2 months ago

Hey @bernard-ng ,

Yes I agree. Do you want to contribute on this one?

bernard-ng commented 1 month ago

@MaximeThoonsen can you have a look ?