Drop two datasets from steganography - Githubissues

openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Other

14.76k stars 2.58k forks source link

Drop two datasets from steganography #1481

Closed thesofakillers closed 6 months ago

thesofakillers commented 6 months ago

Removing two datasets:

PiC/phrase_similarity
vicgalle/alpaca-gpt4

Impact on Steganography:

Only marginal change in data distribution.
We modify the sampling counts such that we have the same total number of samples as before.
Did not re-run results; absolute scores should change but qualitative interpretation of eval will not be different.

Piggybacking this PR to add a small fix for the OpenAIAssistantsSolver which was causing tests to fail.