jataware / beaker-kernel

Contextually-aware notebooks with built-in AI assistant
https://jataware.github.io/beaker-kernel/
MIT License
2 stars 2 forks source link

Context Evaluation Framework #32

Open fivegrant opened 6 months ago

fivegrant commented 6 months ago

@brandomr and I discussed coming up with some kind of framework to evaluate the quality. The general idea is that there would iterate through ground-truth/benchmark question and answers over multiple scenarios.

Quantifying the results might be a bit tricky. LLMs? Cosine Similarity? As for qualitatively viewing the results, we could implement a lightweight UI to quickly flip through all the notebooks and see how they performed. Each entry would probably just be rendered as Markdown so we might not even need to use Beaker-TS.