Questions for Evaluation

KushalGaywala commented 9 months ago

Next steps for everyone, general task should be to generate queries using fragments of documents and then create ground truth with them or if domain expert (we know) is available to help us, to generate at least 10 question for each difficulty level like level 1, 2, 3. Level 1 can be very simple questions like what is this law about or can you explain this low (e.g. consumer rights protection for people living in EU), Level 2, sophie can suggest us or we can ask GPT for that. Level 3, can be very complex question like the example that prof. explained in evaluating lecture.

We can get this ground truths and use them for evaluation. This would be the best option for ground truths.

We can get like 20 documents or 15 documents each, the range of the document can be between 2020/21 to 2023/24.

asmaM1 commented 8 months ago

Annotate 10 articles (3 questions for each article
create evaluation metrics for the Retriever component and Generator component of the model

sid-code14 commented 8 months ago

For Question Generation, we are using RAGAS, a testing framework for RAG pipeline. Using GPT-3 we create question based on documents. First it creates simple question then based on that it creates three different set of questions namely Reasoning: Rewrite the question in a way that enhances the need for reasoning to answer it effectively.

Conditioning: Modify the question to introduce a conditional element, which adds complexity to the question.

Multi-Context: Rephrase the question in a manner that necessitates information from multiple related sections or chunks to formulate an answer. For more detailed information refer to this link : https://docs.ragas.io/en/stable/concepts/testset_generation.html

sid-code14 commented 8 months ago

Used Chatgpt to create Synthetic dateset and RagEvaluator pack from LLamaindex to use different LLM-based metrics.

mmshress / INLP-WS23

Questions for Evaluation #24