Implementation question - Githubissues

xszheng2020 / memorization

An Empirical Study of Memorization in NLP (ACL 2022)

Apache License 2.0

13 stars 0 forks source link

Implementation question #2

Closed Daniel030117 closed 6 months ago

Daniel030117 commented 6 months ago

Do you have code for calculating memorization scores? Also, in the 'compute_if_attr' within CIFAR, where is the path specified in the filename variable?

xszheng2020 commented 6 months ago

Hi, @Daniel030117 thanks for your interest in our work.

Do you have the code for calculating memorization scores? Yes, the memorization score, which is defined as self-influence, is computed in compute_if.py

In the 'compute_if_attr' within CIFAR, where is the path specified in the filename variable? Do you mean the pre-computed memorization attributions? You can find them in the score_42 folder

xszheng2020 commented 6 months ago

And, if you are interested in estimating memorization (self-influence) scores, I suggest using a new method called TRAK rather than the original Influence Functions nowadays.

You may take a look at this repo LLM-TRAK. 截屏2024-03-08 下午1 32 32

Daniel030117 commented 6 months ago

The filename path here is in saved/random/0/42/checkpoint, but I couldn't find the checkpoint folder.

xszheng2020 commented 6 months ago

We did not share the model checkpoints because they are too large.

We only shared the pre-computed memorization attributions.

You can obtain the checkpoints by conducting the training by yourself.

CUDA_VISIBLE_DEVICES=0 python -u train.py --SEED=42 --SAVE_CHECKPOINT=True > log/random/0/log_seed_42.txt

xszheng2020 commented 6 months ago

first mkdir -p log/random/0 as shown in run_random_0.sh.

xszheng2020 commented 6 months ago

:)

Please first follow the README.md ...

Download the CIFAR-10, SNLI, SST, Yahoo! Answer datasets from web and then process them using the 00_EDA.ipynb

Daniel030117 commented 6 months ago

thankyou