run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.57k stars 4.71k forks source link

[Tracking]: Evaluation Integrations #6824

Closed jon-chuang closed 8 months ago

jon-chuang commented 1 year ago

Feature Description

TODO: investigate more RAG-specific benchmarks rather than retrieval-only or generation-only.

References / Areas of Exploration

  1. Academic work on FLARE - RAG + reranking. Can get some inspiration from their evaluation methodology
  2. Wizard of Wikipedia - Knowledge Augmented Dialogue. Not sure if there is any evaluation.
  3. Evaluation of Dataset Selection for Pre-Training and Fine-Tuning Transformer Language Models for Clinical Question Answering
  4. Streaming QA.

Target Domains

  1. Finance: FinQA - FinTabNet Corpus.
  2. Codebase?? e.g. codequeries, CodeQA

TODO:

  1. Compile all of the dataset sizes (we should prefer fine-tuning datasets with out-of-domain corpora rather than large and generic corpora which are likely representative of the LLM training datasets)
dosubot[bot] commented 9 months ago

Hi, @jon-chuang! I'm Dosu, and I'm here to help the LlamaIndex team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue you created discusses the evaluation of integrations for the project. You outlined various tasks to investigate, such as retrieval, RAG/Knowledge-Intensive QA, and MMLU. You also provided references and areas for further research. However, there hasn't been any activity on the issue since then.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LlamaIndex repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your contribution, and we look forward to hearing from you soon!

Best regards, Dosu