lyy1994 / awesome-data-contamination

The Paper List on Data Contamination for Large Language Models Evaluation.
MIT License
77 stars 2 forks source link

Kindly request the inclusion #5

Closed XuandongZhao closed 5 months ago

XuandongZhao commented 5 months ago

Our new ICML paper studies the membership inference attack (data contamination) in a black-box manner.

Title: DE-COP: Detecting Copyrighted Content in Language Models Training Data (ICML 2024) Paper: https://arxiv.org/abs/2402.09910 Code: https://github.com/LeiLiLab/DE-COP?tab=readme-ov-file

Dataset: https://huggingface.co/datasets/avduarte333/BookTection https://huggingface.co/datasets/avduarte333/arXivTection

It would be my pleasure if my work could be included in the repo. Thank you!

lyy1994 commented 5 months ago

Thanks for letting me know I missed your work. I have added your paper to the list.