lyy1994 / awesome-data-contamination

The Paper List on Data Contamination for Large Language Models Evaluation.
MIT License
78 stars 2 forks source link

Kindly request the inclusion #4

Closed ShangQingTu closed 5 months ago

ShangQingTu commented 5 months ago

Thank you for this great paper collection! It will be my pleasure if my work can be included in the repo; thanks!

<!DOCTYPE html>
Title
Paper Code Venue Classification Model Comment
DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning https://arxiv.org/abs/2406.04197 https://github.com/THU-KEG/DICE arXiv'24 img imgimg LLM A novel contamination detection method which leverages the internal states of LLMs to detect data contamination in fine-tune stage for math reasoning.
KoLA: Carefully Benchmarking World Knowledge of Large Language Models https://openreview.net/forum?id=AqN23oqraW https://kola.xlore.cn/ ICLR'24 img LLM A carefully designed evolving benchmark for evaluating LLMs' world knowledge. KoLA benchmark is evolving so that it can avoid the data contamination issue.


lyy1994 commented 5 months ago

Sorry for the late reply. I have added your work to the paper list. Thanks for your suggestion!

ShangQingTu commented 5 months ago

Thank you!