A Tchoung té

Yemba language meaning association/group in French

The objective of the project is to federate the metadata of all Cameroonian associations in France to make them more accessible to the community.

Functional Context

Presentation video (in French)

If you want to do data analysis, the raw latest database of cameroonian association is accessible here.

We also maintained a public dashboard to visualize associations here

Technical context

If you are here, it means that you are interested in an in-house deployment of the solution. Follow the guide :) !

Prerequisites

Create a Sourcegraph account and get credentials to use CodyAI
Devspace installed locally
Have admin access on a Gogocarto
Go through the Gogocarto tutorials
Locally install all tools ( init and command scripts from the .gitpod.yml file or use a ready-made development environment on gitpod :

Deployment

Execute filter-cameroon.ipynb et enrich-database.ipynb notebooks :

    pipenv shell
    secretsfoundry run --script 'python filter-cameroon.py'

Finally use the resulting csv file as a data source in Gogocarto and customize it. You can for example define icons by category (social object); ours are in html/icons.

These have been built from these basic icons https://thenounproject.com/behanzin777/kit/favorites/

Update database

    csvdiff ref-rna-real-mars-2022.csv rna-real-mars-2022-new.csv -p 1 --columns 1 --format json | jq '.Additions' > experiments/update-database/diff.csv
    python3 main.py

Start the chatbot

    cd etl/
    secretsfoundry run --script "chainlit run experiments/ui.py"

Deploy the chatbot

   devspace deploy

Evaluation

RAG base evaluation dataset

The list of runs runs.csv has been built by getting all the runs from the beginning using:

export LANGCHAIN_API_KEY=<key>
cd evals/
python3 rag-evals.py save_runs --days 400

Then we use lilac to get the most interesting questions by clustering them per topic/category. "Associations in France" was the one chosen, and we also deleted some rows due to irrelevance.

The clustering repartition is available here: Clustering Repartition

Finally, you just need to do:

export LANGCHAIN_API_KEY=<key>
cd evals/
python3 rag.py ragas_eval tchoung-te --run_ids_file=runs.csv
python3 rag.py deepeval tchoung-te --run_ids_file=runs.csv

RAG offline evaluation

Whenever you change a parameter that can affect RAG, you can execute all inputs present in evals/base_ragas_evaluation.csv using langsmith to track them. Then you just have to get the runs and execute above command. As it's just 27 elements, you will be able to compare results manually.

Backtesting the prompt

   cd etl/
   python3 backtesting_prompt.py

Create the dataset on which you want to test the new prompt on langSmith. Then run the file above to backtest and see the results of the new prompt on the dataset. You would specify in the file the name of the dataset before running

Contributors ✨

Thanks goes to these wonderful people (emoji key):

_{Ghislain TAKAM} ✅ 🔣	_pdjiela ✅	_{DimitriTchapmi} ✅	_GNOKAM ✅ 🔣	_{fabiolatagne97} ✅ 🔣	_hsiebenou 🔣 ⚠️ ✅	_{Flomin TCHAWE} 💻 ✅ 🔣
_{Bill Metangmo} 💻 🔣 🤔 ⚠️ ✅	_dimitrilexi 🔣	_ngnnpgn 🔣	_{Tchepga Patrick} 🔣