raid-guild / gaianet-rag-api-pipeline

Supercharge your Gaianet node by generating a vector knowledge base from any API. Demo slides: https://hackmd.io/@santteegt/ByoykY4nC#/ Link to Docs below
https://raid-guild.github.io/gaianet-rag-api-pipeline/
MIT License
1 stars 0 forks source link

2.1.3 Decide on LLM Embedding Model #17

Closed earth2travis closed 2 months ago

earth2travis commented 4 months ago

Select appropriate LLM embedding model for the Gaia node. The chosen model should be compatible with the data structure and use case of the project, ensuring optimal performance and accuracy. The decision-making process will include evaluating different models based on criteria such as embedding quality, processing speed, and resource requirements.

wtfsayo commented 3 months ago

Got KT from @santteegt, started on this as well!!

wtfsayo commented 3 months ago

Waiting for output connector / pipeline to be working from @santteegt !!! then can do local testing for most compatible model

santteegt commented 3 months ago

The output connector is ready for testing. Here are some instructions on how to use it with gaia:

{
  ...
  "embedding_collection_name": "boardroom_test",
  "embedding_ctx_size": "768",
  ...
  "snapshot": "boardroom_test-xxxxxx.....snapshot",
  ....
}
wtfsayo commented 3 months ago

Data doesn't get inserted into qdrant vector db!!! Can you share the Jupyter notebook with cleanup stuff!!!

santteegt commented 3 months ago

@wtfsayo https://github.com/raid-guild/gaianet-rag-api-pipeline/blob/main/experiments/data_explore_cftoc.ipynb just click run all cells

wtfsayo commented 3 months ago

Still nothing got inserted to vector store @santteegt, is it working on your machine?

santteegt commented 3 months ago

@wtfsayo yes. It's been working for me all the time. What error are you getting?

wtfsayo commented 3 months ago

Started to work now!

wtfsayo commented 3 months ago

The output connector is ready for testing. Here are some instructions on how to use it with gaia:

  • Open the data_explore.ipynb notebook
  • Run a local qdrant instance using docker (just for embeddings + snapshot generation):
docker run -p 6333:6333 -p 6334:6334 -v ./qdrant_dev:/qdrant/storage:z qdrant/qdrant:v1.10.
  • For testing purposes enable the http input connector (omit the airbyte connector cells) to extract the first page from the proposals endpoint
  • Execute the rest of the pipeline using this input data
  • Download the snapshot file using the URL displayed at the end of the pipeline execution and save it on the gaianet folder (e.g. "boardroom_test-xxxxxx.....snapshot")
  • Shutdown the qdrant docker instance
  • On the gaia node side, the config file should have the following updates:
{
  ...
  "embedding_collection_name": "boardroom_test",
  "embedding_ctx_size": "768",
  ...
  "snapshot": "boardroom_test-xxxxxx.....snapshot",
  ....
}
  • run gaia init and gaia start

going to try this now!! (with llama3-8b)