We need to define the metrics to create test suites and measure results.
I would like to test for hallucinations and response quality according to a grounded set.
Ideally, we would use something like SelfCheckGPT to check for hallucinations.
I'd also like to test recall for document retrieval on a known public dataset with classic vs graph RAG.
Metrics
We need to define the metrics to create test suites and measure results. I would like to test for hallucinations and response quality according to a grounded set.
Ideally, we would use something like SelfCheckGPT to check for hallucinations.
I'd also like to test recall for document retrieval on a known public dataset with classic vs graph RAG.