zjysteven / mink-plus-plus

Min-K%++: Improved baseline for detecting pre-training data of LLMs https://arxiv.org/abs/2404.02936
https://zjysteven.github.io/mink-plus-plus/
MIT License
26 stars 5 forks source link
llama llm mamba membership-inference-attack pretraining-data-detection pythia

Min-K%++: Improved Baseline for Detecting Pre-Training Data of LLMs

Overview

teaser figure

We propose a new Membership Inference Attack method named Min-K%++ for detecting pre-training data of LLMs, which achieves SOTA results among reference-free methods. This repo contains the lightweight implementation of our method (along with all the baselines) on the WikiMIA benchmark. For experiments on the MIMIR benchmark, please refer to our fork here.

arXiv

Setup

Environment

First install torch according to your environment. Then simply install dependencies by pip install -r requirements.txt. It will install the latest transformer library from the github main branch, which is required to run Mamba models as of 2024/04.

Our code is tested with Python 3.8, PyTorch 2.2.0, Cuda 12.1.

Data

All data splits are hosted on huggingface and will be automatically loaded when running scripts.

Running

There are four scripts, each of which is self-contained to best facilitate quick reproduction and extension. The meaning of the arguments of each script should be clear from their naming.

The outputs of these scripts will be a csv file consisting of method results (AUROC and TPR@FPR=5%) stored in the results directory, with the filepath indicating the dataset and model. Sample results by running the four scripts are provided in the results directory.

HF paths of evaluated model in the paper

Acknowledgement

This codebase is adapted from the official repo of Min-K% and WikiMIA.

Citation

If you find this work/repo/data splits useful, please consider citing our paper:

@article{zhang2024min,
  title={Min-K\%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models},
  author={Zhang, Jingyang and Sun, Jingwei and Yeats, Eric and Ouyang, Yang and Kuo, Martin and Zhang, Jianyi and Yang, Hao and Li, Hai},
  journal={arXiv preprint arXiv:2404.02936},
  year={2024}
}