This repository contains a comprehensive evaluation of the Colivara API for document management, search, and retrieval, using a Retrieval-Augmented Generation (RAG) model. This evaluation aims to assess Colivara's capabilities in managing document collections, performing efficient search operations, and calculating relevance metrics to measure performance.
Benchmark | Colivara | vidore_colqwen2-v1.0 (Current Leader) | vidore_colpali-v1.3 | vidore_colpali |
---|---|---|---|---|
Average | 87.6 ↓ | 89.3 | 84.8 | 81.3 |
Tat DQA | 71.7 ↓ | 81.4 | 70.4 | 65.8 |
Shift Project | 91.3 ↑ | 90.7 | 77.4 | 73.2 |
Artificial Intelligence | 99.5 ↑ | 99.4 | 97.4 | 96.2 |
Government Reports | 96.7 ↑ | 96.3 | 96.2 | 92.7 |
ArxivQA | 88.1 ↑ | 88.1 | 83.0 | 79.1 |
DocVQA | 56.1 ↓ | 60.6 | 58.5 | 54.4 |
Healthcare Industry | 98.3 ↑ | 98.1 | 96.9 | 94.4 |
InfoVQA | 91.4 ↓ | 92.6 | 85.7 | 81.8 |
Energy | 96.3 ↑ | 95.9 | 95.4 | 91.0 |
TabFQuad | 86.3 ↓ | 89.5 | 87.4 | 83.9 |
The goal of this project is to evaluate Colivara’s document retrieval and management features, particularly for applications that rely on high-performance data search and retrieval. This includes testing Colivara's collection and document management, assessing its suitability for various search and retrieval scenarios, and benchmarking the platform with a RAG model to evaluate relevance based on real-world queries.
Below are the summarized evaluation results for the Colivara API performance based on NDCG metrics:
Benchmark | Colivara Score | Avg Latency (s) | Num Docs |
---|---|---|---|
Average | 87.6 | ---- | ---- |
ArxivQA | 88.1 | 11.1 | 500 |
DocVQA | 56.1 | 9.3 | 500 |
InfoVQA | 91.4 | 8.6 | 500 |
Shift Project | 91.3 | 16.8 | 1000 |
Artificial Intelligence | 99.5 | 12.8 | 1000 |
Energy | 96.3 | 14.1 | 1000 |
Government Reports | 96.7 | 14.0 | 1000 |
Healthcare Industry | 98.3 | 20.0 | 1000 |
TabFQuad | 86.3 | 8.1 | 280 |
TatQA | 71.7 | 20.0 | 1663 |
The required Python packages are listed in requirements.txt
, including:
pandas
numpy
tqdm
dotenv
colivara_py
(Colivara client library)pytest
(for testing)Clone the repository:
git clone https://github.com/yourusername/colivara-evaluation.git
cd colivara-evaluation
Install the dependencies:
pip install -r requirements.txt
Configure Environment Variables:
.env
file in the root directory.COLIVARA_API_KEY=your_api_key_here
COLIVARA_BASE_URL=https://api.colivara.com
The Colivara Evaluation Project provides a streamlined interface for managing and evaluating document collections within Colivara. The primary entry points for usage are main.py
for performing document upsert operations and evaluate.py
for relevance evaluation.
main.py
The main.py
script enables you to upsert documents into Colivara collections. It allows selective processing of single datasets or batch processing across all available datasets, making it adaptable for various scenarios.
--n_rows
: Specify the number of rows to load from the dataset for processing. This is optional; if not provided, the script will load all rows.--upsert
: Include this flag if you want to upsert documents into Colivara.--all_files
: Processes all datasets in the DOCUMENT_FILES
list.--specific_file
: Specify a single file to process by name (must match one of the files in DOCUMENT_FILES
).--collection_name
: Use this to define a custom collection name when processing a specific file. If not provided, the script defaults to the predefined collection name for that file.To upsert documents from a specific dataset, run:
python main.py --specific_file arxivqa_test_subsampled.pkl --collection_name arxivqa_collection --upsert
This command will upsert all documents from arxivqa_test_subsampled.pkl
into arxivqa_collection
if it doesn’t already exist.
To upsert documents for all datasets:
python main.py --all_files --upsert
This command will loop through all datasets in DOCUMENT_FILES
, upserting documents into their corresponding collections.
evaluate.py
The evaluate.py
script is used to evaluate the relevance of document collections within Colivara.
--api_key
: Your Colivara API key for authentication.--collection_name
: Specify the collection name to evaluate.--all_files
: Evaluate all collections listed in DOCUMENT_FILES
.To evaluate the relevance of a specific collection, run:
python evaluate.py --api_key "your_api_key_here" --collection_name arxivqa_collection
This command will evaluate the specified collection and output the relevance metrics based on NDCG@5.
To evaluate the relevance of all collections:
python evaluate.py --api_key "your_api_key_here" --all_files
This command will perform a relevance evaluation (NDCG@5) on all datasets listed in DOCUMENT_FILES
and save the results in the out/
directory:
out/avg_ndcg_scores.pkl
– Contains the average NDCG@5 score for each dataset.out/ndcg_scores.pkl
– Provides detailed NDCG scores for each query.out/<collection_name>_ndcg_scores.pkl
– Provides detailed NDCG scores for each query in the specified collection.collection_manager.py
The collection_manager.py
script provides utilities for listing and deleting collections within Colivara.
List All Collections
python collection_manager.py --list
Displays all existing collections within Colivara.
Delete a Collection
python collection_manager.py --delete <collection_name>
Deletes the specified collection. This action is irreversible, so ensure that the correct collection name is provided.
src/
client.py
: Initializes the Colivara client.config.py
: Loads API key and base URL from environment variables.data_loader.py
: Handles data loading and base64 image encoding.document_manager.py
: Manages document upserting and collection creation.evaluator.py
: Evaluates model performance using NDCG.collection_manager.py
: Provides collection listing and deletion tools.main.py
: Main script for document upsertion and evaluation.tests/
: Contains unit tests for the project.data/
: Stores the dataset for evaluation..env
: Environment configuration file (not included in version control).requirements.txt
: Lists Python package dependencies.The project configuration relies on environment variables defined in a .env
file:
COLIVARA_API_KEY
: API key for authenticating with the Colivara service.COLIVARA_BASE_URL
: The base URL for accessing Colivara's API.Use dotenv
to load these configurations automatically, ensuring that sensitive information is securely managed.
DCG is a measure of relevance that considers the position of relevant results in the returned list. It assigns higher scores to results that appear earlier.
NDCG normalizes DCG by dividing it by the ideal DCG (IDCG) for a given query, providing a score between 0 and 1. In this project, we calculate NDCG@5 to evaluate the top 5 search results for each query.
The evaluation process includes:
This project is licensed under the MIT License - see the LICENSE file for details.