wisecubeai / pythia

Open source AI hallucination monitoring
https://askpythia.ai/
Apache License 2.0
4 stars 0 forks source link

Accuracy Benchmarking code #11

Open cloudronin opened 1 week ago

cloudronin commented 1 week ago

We need to check in the benchmarking code as a tool/notebook that can be run against the local pythia deployment to measure how well Pythia is doing on various benchmark datasets