megacamelus / camel-assistant

Apache License 2.0
1 stars 6 forks source link

camel-assistant

Trying it using Docker Compose

  1. Start an OpenAI compatible API in a host accessible from the containers
OLLAMA_HOST=localhost:8000 ollama serve
  1. Pull the mistral:latest model on the host you are running Ollama
ollama pull mistral:latest
  1. Start the containers using Docker Compose
docker-compose up
  1. Wait for everything to be up and then pull the orca-mini model:
podman exec -it camel-assistant-ollama-1 ollama pull orca-mini

NOTE: this may take a while, as it needs to download about 2Gb of data from HuggingFace.

  1. Proceed to the Loading Data section for details about how to load data

Trying it manually

Requirements

NOTE: URLs and hostnames can be configured in the application.properties file or exported via environment variables. For instance if using Qdrant in another host, you can set its host using the QDRANT_HOST variable.

Steps

  1. Build the project
mvn clean package
  1. Launch Qdrant:
podman run -d --rm --name qdrant -p 6334:6334 -p 6333:6333 qdrant/qdrant:v1.9.7-unprivileged
  1. Launch Ollama:
OLLAMA_HOST=localhost:8000 ollama serve

NOTE: make sure you have the mistral:latest model available. If not, then download it using OLLAMA_HOST=localhost:8000 ollama pull mistral:latest.

  1. Launch the ingestion sink:
KAFKA_BROKERS=kafka-host:9092 java -jar ./assistant-ingestion-sink/target/quarkus-app/quarkus-run.jar
  1. Launch the ingestion source:
KAFKA_BROKERS=kafka-host:9092 java -jar ./assistant-ingestion-sources/plain-text-source/target/quarkus-app/quarkus-run.jar
  1. Launch the backend:
KAFKA_BROKERS=kafka-host:9092 java -jar ./assistant-backend/target/quarkus-app/quarkus-run.jar
  1. Launch the UI:
java -jar assistant-ui-vaadin/target/quarkus-app/quarkus-run.jar

Loading Data

Loading PDFs

To load PDF data (such as those from documentation, books, etc) into the QDrant DB, use the command:

cd assistant-cli && java -jar target/quarkus-app/quarkus-run.jar consume file /path/to/red_hat_build_of_apache_camel-4.0-tooling_guide-en-us.pdf

NOTE: you can download some PDFs from here.

Loading Datasets

You can load data from the Camel Dataset.

To download the dataset for data formats:

huggingface-cli download --repo-type dataset --local-dir camel-dataformats megacamelus/camel-dataformats

To download the dataset for components:

huggingface-cli download --repo-type dataset --local-dir camel-components megacamelus/camel-components

Use this command to load the dataset into the DB:

java -jar target/quarkus-app/quarkus-run.jar consume dataset --path ~/code/datasets/dataset/ --source org.apache.camel

Checking if the data was loaded

Wait a few seconds after running the load command, and then check if the data is available in the Qdrant DB:

curl -X POST http://localhost:6333/collections/camel/points/scroll -H "Content-Type: application/json" -d "{\"limit\": 50 }" | jq .