microsoft / deepRAG

Uses a blend of experimental techniques to enhance LLM RAG resultsets.
MIT License
0 stars 1 forks source link

Create test criteria. #2

Open Tyler-R-Kendrick opened 4 days ago

Tyler-R-Kendrick commented 4 days ago

Metrics

We need to define the metrics to create test suites and measure results. I would like to test for hallucinations and response quality according to a grounded set.

Ideally, we would use something like SelfCheckGPT to check for hallucinations.

I'd also like to test recall for document retrieval on a known public dataset with classic vs graph RAG.