hegelai / prompttools

Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).
http://prompttools.readthedocs.io
Apache License 2.0
2.56k stars 216 forks source link

Add support for other models in AutoEval #44

Open NivekT opened 11 months ago

NivekT commented 11 months ago

🚀 The feature

This is a good task for a new contributor

We have a few utility functions to perform AutoEval:

https://github.com/hegelai/prompttools/blob/main/prompttools/utils/autoeval.py https://github.com/hegelai/prompttools/blob/main/prompttools/utils/autoeval_scoring.py https://github.com/hegelai/prompttools/blob/main/prompttools/utils/expected.py

Currently, they tend to only support one model each. Someone can re-factor the code for each of them to support multiple models. I would recommend making sure they all support for the best known models such as GPT-4 and Claude 2.

We can even consider LlaMA but that is less urgent.

Tasks

Motivation, pitch

Allow people to auto-evaluate with different best models would be ideal

Alternatives

No response

Additional context

No response

rachittshah commented 11 months ago

@NivekT i think if we add this with #31 , it might be faster and we can build on top of Llama-index's evals.

What do you think?

divij9 commented 11 months ago

So we want to accept an array of models and evaluate against all of them, right?

NivekT commented 11 months ago

@rachittshah We can consider add LlamaIndex's eval if it integrates well with the pattern we have here. Feel free to propose something and we can have a look.

@divij9 That can be part of it, but for each of the eval function linked above, they currently only support OpenAI or Anthropic.

I will update the main issue to break the request into pieces that are easier for first-time contributors to work on.

NivekT commented 11 months ago

I have updated the ask to be bite-size. Feel free to comment if anything is unclear!

Divij97 commented 11 months ago

I think I understand what our goal is. Can you please assign this to me?

NivekT commented 11 months ago

@Divij97 Sure! Let us know if you plan to work on all 4 subtasks or a specific one. Feel free pick whichever you think you can contribute to. Thanks!

ishaan-jaff commented 11 months ago

I'd love to help adding support for new models using https://github.com/BerriAI/litellm. Let me know if I can help out on this too

steventkrawczyk commented 11 months ago

@ishaan-jaff Awesome! Could you create an issue for it and I can assign that to you? I think the best approach would be to create a LitellmExperiment, you can follow some examples we've done for other APIs:

Divij97 commented 11 months ago

Hi @steventkrawczyk I have made the required changes but I am not able to push. I signed the cla but the push still returns a 403 for me

steventkrawczyk commented 11 months ago

@Divij97 you will need to push to a fork and raise a PR from the fork to the original repo

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork

Divij97 commented 11 months ago

Nevermind it was a key chain issue with my stupid mac. Can you please take a look at this PR: https://github.com/hegelai/prompttools/pull/59. It's not complete but I wanted to understand if I am headed in the right direction

Divij97 commented 11 months ago

Wanted to update about my progress. I am done with all the changes but wanted some help testing them out for Anthropic. How do I generate an Anthropic key to test?