hegelai / prompttools

Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).
http://prompttools.readthedocs.io
Apache License 2.0
2.56k stars 216 forks source link

Add ingestion harness for vectorDB experiments #49

Open steventkrawczyk opened 11 months ago

steventkrawczyk commented 11 months ago

🚀 The feature

We need a way to experiment with different chunking + ingestion strategies. For example, we have some "raw" documents we want to ingest into a vector database, and there are different ways of transforming those "raw" documents into the documents we end up vectorizing. For example, we can ingest them as is, "chunk" them into 10-line chunks, or do other pre-processing to extract keywords and relevant phrases.

Motivation, pitch

Talking to some customers about their needs regarding vector DB evaluation at scale.

Alternatives

No response

Additional context

No response