lm-sys / llm-decontaminator

Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
Apache License 2.0
286 stars 23 forks source link

Evaluating contamination effect over `evol-codealpaca-v1` #4

Open ganler opened 10 months ago

ganler commented 10 months ago

Thanks for the great work!

I am curious how this decontaminator would perform over https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1 which seems to help some SOTA models to achieve 78% pass@1 on HumanEval. Would this work out of the box if one just follows the README examples? Thanks!

andy-yang-1 commented 10 months ago

Would this work out of the box if one just follows the README examples

@ganler Feel free to follow the steps, and I am really curious about the results too!

wyt2000 commented 6 months ago

Is there any result about this? I also found a lot of duplicated data between evol-codealpaca-v1 and HumanEval benchmark.