tryAGI / LangChain

C# implementation of LangChain. We try to be as close to the original as possible in terms of abstractions, but are open to new entities.
https://tryagi.github.io/LangChain/
MIT License
507 stars 78 forks source link

IVectorCollection extensions: AddPdfAsync, AddHtmlAsync, AddWordAsync #265

Closed HavenDV closed 4 months ago

HavenDV commented 4 months ago

Each extension should be in specific Source package(for example in LangChain.Sources.Pdf)

It will simplify code like this:

var vectorDatabase = new SqLiteVectorDatabase("vectors.db");
var vectorCollection = await vectorDatabase.GetOrCreateCollectionAsync("harry-potter", dimensions: 1536);
if (await vectorCollection.IsEmptyAsync())
{
    var pdfSource = new PdfPigPdfSource("E:\\AI\\Datasets\\Books\\Harry-Potter-Book-1.pdf");
    var documents = await pdfSource.LoadAsync();

    await vectorCollection.AddSplitDocumentsAsync(
        embeddingModel,
        documents);
}

to

var vectorDatabase = new SqLiteVectorDatabase("vectors.db");
var vectorCollection = await vectorDatabase.GetOrCreateCollectionAsync("harry-potter", dimensions: 1536);
if (await vectorCollection.IsEmptyAsync())
{
    await vectorCollection.AddPdfFromPathAsync(
        embeddingModel,
        "E:\\AI\\Datasets\\Books\\Harry-Potter-Book-1.pdf");
}