Implement semantic chunking within RAG EXP ACC

microsoft / rag-experiment-accelerator

The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the process of conducting experiments and evaluations using Azure Cognitive Search and RAG pattern.

https://github.com/microsoft/rag-experiment-accelerator

Other

193 stars 70 forks source link

Implement semantic chunking within RAG EXP ACC #816

Open ritesh-modi opened 2 weeks ago

ritesh-modi commented 2 weeks ago

Currently, RAGE provides different chunking methods. An additional semantic chunking method should be added which should be configurable.

FlorianPydde commented 1 week ago

what's the current recommended way to create a custom chunker ? If I understand correctly, one would have do the following:

In documentLoader.py -> add a new custom _FORMAT_PROCESSORS
create a customLoader that follows the ouput format of structuredLoader, like this: docsList.append({str(uuid.uuid4()): {"content": doc.page_content, "metadata":doc.metadata}})

Is my understanding correct ? @quovadim , @ritesh-modi