Benchmarking - Githubissues

hegelai / prompttools

Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).

http://prompttools.readthedocs.io

Apache License 2.0

2.56k stars 216 forks source link

Benchmarking #72

Closed HashemAlsaket closed 11 months ago

HashemAlsaket commented 11 months ago

Kick starting benchmarking [draft]. Keeping the HellaSwag data set in the branch for convenience until merge (if the sample size is too big, feel free to cut it down [currently 60MB] but I think it's okay for now).

High level thought process: experiments, data, evals submitted to benchmark class -> evals run on responses against data -> log

LuvvAggarwal commented 11 months ago

@HashemAlsaket I have built a utility to load datasets from Hugging face, I hope it could be useful

LuvvAggarwal commented 11 months ago

Can we use hugging face dataset library, please review #75

HashemAlsaket commented 11 months ago

Awesome work, @LuvvAggarwal . I think we can include #75 as a good util function. Can you put in the PR to merge into this branch instead of main?

HashemAlsaket commented 11 months ago

@NivekT all issues tended :+1:

LuvvAggarwal commented 11 months ago

@HashemAlsaket I am unable to merge into this repository

NivekT commented 11 months ago

Thanks @HashemAlsaket!! 🚀