Cannot reproduce distillgpt2 LM Numbers using --knn

neulab / knn-transformers

PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022), including an implementation of kNN-LM and kNN-MT

MIT License

271 stars 22 forks source link

Cannot reproduce distillgpt2 LM Numbers using --knn #14

Open HossamAmer12 opened 1 month ago

HossamAmer12 commented 1 month ago

I am trying to build on your knn-transfomers repo.

When I run the distill gpt with the given setup in the repo but with --knn flag, I get around 21.xx preplexity. This number is different than the one reported in the repository.

MODEL=neulab/distilgpt2-finetuned-wikitext103
python -u run_clm.py \
  --model_name_or_path ${MODEL} \
  --dataset_name wikitext --dataset_config_name wikitext-103-raw-v1 \
  --output_dir checkpoints/${MODEL}_knn \
  --do_eval --eval_subset validation \
  --dstore_dir /tmp/distillgpt2/ --dstore_size 116988150

I am able to reproduce the other numbers (baseline + retomaton) for distill gpt.

Could you please let me know if you have any clue here?

urialon commented 1 month ago

Hi Hossam, Thank you for your interest in our work.

I believe that you need to rebuild the KNN datastore specifically for distill-GPT. Have you done that?

Best, Uri

On Thu, Oct 17, 2024 at 12:38 Hossam Amer @.***> wrote:

I am trying to build on your knn-transfomers repo https://github.com/neulab/knn-transformers/tree/master?tab=readme-ov-file .

When I run the distill gpt with the given setup in the repo but with --knn flag, I get around 21.xx preplexity. This number is different than the one reported in the repository.

I am able to reproduce the other numbers for distill gpt.

Could you please let me know if you have any clue here?

— Reply to this email directly, view it on GitHub https://github.com/neulab/knn-transformers/issues/14, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSOXMHVGBTAFDUHXM4CTSTZ37RW5AVCNFSM6AAAAABQEE3NW6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGU4TKMRSG44TEOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

HossamAmer12 commented 1 month ago

Thanks @urialon for getting back.

The model that I was using in the previous (sorry I edited my post above) is the one given in the repo. That said, the scores are different.

Based on your suggestion, I tried building the dstore myself. everytime I get to this error: UserScriptFilledDisk: User script filled the disk. Consider using Virtual Machine SKU with larger disk size.

That's the command I used for building the datastore:

MODEL=neulab/distilgpt2-finetuned-wikitext103
path_to=""

CUDA_VISIBLE_DEVICES=0 python -u run_clm.py \
  --model_name_or_path ${MODEL} \
  --dataset_name wikitext --dataset_config_name wikitext-103-raw-v1 \
  --do_eval --eval_subset train \
  --output_dir $path_to/checkpoints/${MODEL} \
  --dstore_dir $path_to/checkpoints/${MODEL} \
  --save_knnlm_dstore --dstore_size 116988150

Does it require too much size?

Question: Do I have to specify the dstore size here? What does the dstore size indicate? Number of contexts? Another question. When running knn-lm + given distill gpt, should I use a specific temperature or lambda? I saw you posting on this so that we reproduce the scores.

HossamAmer12 commented 1 month ago

Just want to update on the issue. Using the following did not result into the size issue:

MODEL=neulab/distilgpt2-finetuned-wikitext103
CUDA_VISIBLE_DEVICES=0 python -u run_clm.py \
  --model_name_or_path ${MODEL} \
  --dataset_name wikitext --dataset_config_name wikitext-103-raw-v1 \
  --do_eval --eval_subset validation \
  --output_dir ${path}/checkpoints/${MODEL}\_SAVE0 \
  --dstore_dir ${path}/checkpoints/${MODEL}\_SAVE0 \
  --save_knnlm_dstore --dstore_size 116988150

I guess that's due to the small size of the validation split (I know that's not realistic setup). Do you know the size of the training set and what our limits are?

HossamAmer12 commented 1 month ago

Hi Uri,

I tried to construct the datastore with the wikitext validation set and given distill gpt model. Then run knn using the same set. The final perplexity scores are not good relative to baseline.

What could be the problem?

Even though the setup is not practical, I expected that the perplexity to be a lot better given the datastore set and eval set are the same.

That of course not being able to using the training set for knn datastore due to memory problems. I have not yet figured out the reason.

I kindly ask for your helpful advice.

Thanks, Hossam

urialon commented 1 month ago

I just replied to you in a different thread, let me if anything is still unclear.

On Sun, Oct 20, 2024 at 13:35 Hossam Amer @.***> wrote:

Hi Uri,

I tried to construct the datastore with the wikitext validation set and given distill gpt model. Then run knn using the same set. The final perplexity scores are not good relative to baseline.

What could be the problem?

Even though the setup is not practical, I expected that the perplexity to be a lot better given the datastore set and eval set are the same.

That of course not being able to using the training set for knn datastore due to memory problems. I have not yet figured out the reason.

I kindly ask for your helpful advice.

Thanks, Hossam

— Reply to this email directly, view it on GitHub https://github.com/neulab/knn-transformers/issues/14#issuecomment-2425145024, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSOXMDFFAWVEGPJ2QWOILDZ4PSV7AVCNFSM6AAAAABQEE3NW6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRVGE2DKMBSGQ . You are receiving this because you were mentioned.Message ID: @.***>