neulab / knn-transformers

PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022), including an implementation of kNN-LM and kNN-MT
MIT License
271 stars 22 forks source link

NonMatchingSplitsSizesError #12

Closed Jing-L97 closed 7 months ago

Jing-L97 commented 8 months ago

Hi, I encountered NonMatchingSplitsSizesError when evaluating the finetuned model: gpt2-finetuned-wikitext103. The same also popped up when Saving a Datastore and Building the FAISS index by myself. Would you mind indicating how to solve this issue? Thank you very much!

image

urialon commented 8 months ago

Hi @Jing-L97 , Thank you for your interest in our work.

Can you please provide your huggingface transformers version, the exact command line that you ran, the full stack trace, and whether you made any changes to the code?

Best, Uri

Jing-L97 commented 8 months ago

Hi Uri,

Thank you so much for your quick reply!

I didn't make any change to the code. The transformers version is 4.28.1 and torch 2.1.0. I downloaded gpt2-finetuned-wikitext103 and the command line is below.

MODEL=neulab/gpt2-finetuned-wikitext103 python -u run_clm.py \ --model_name_or_path ${MODEL} \ --dataset_name wikitext --dataset_config_name wikitext-103-raw-v1 \ --output_dir checkpoints/${MODEL} \ --do_eval --eval_subset validation

I was wondering if it's ok to set ignore_verifications=True when loading the dataset? I tried this but it returned another error (see attachment) when evaluating model using the code above.

Thanks again for your kind help!

image

image

image

urialon commented 8 months ago

So what exactly is the problem?

On Mon, Mar 11, 2024 at 13:46 Jing Liu @.***> wrote:

Hi Uri,

Thank you so much for your quick reply!

I didn't make any change to the code. The transformers version is 4.28.1 and torch 2.1.0. I downloaded gpt2-finetuned-wikitext103 and the command line is below.

MODEL=neulab/gpt2-finetuned-wikitext103 python -u run_clm.py --model_name_or_path ${MODEL} --dataset_name wikitext --dataset_config_name wikitext-103-raw-v1 --output_dir checkpoints/${MODEL} --do_eval --eval_subset validation

I was wondering if it's ok to set ignore_verifications=True when loading the dataset?

Thanks again for your kind help!

image.png (view on web) https://github.com/neulab/knn-transformers/assets/84009338/927ddbf5-95b4-4940-bde8-e8845d2ecb3f

image.png (view on web) https://github.com/neulab/knn-transformers/assets/84009338/0237417e-c523-4206-9eba-0649cb931f6c

— Reply to this email directly, view it on GitHub https://github.com/neulab/knn-transformers/issues/12#issuecomment-1989064845, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSOXMHKLMHPOX7T3UIZGV3YXXUYJAVCNFSM6AAAAABEQTVYTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBZGA3DIOBUGU . You are receiving this because you commented.Message ID: @.***>

Jing-L97 commented 8 months ago

It returned error when evaluating the pretrained LM. 1) If there is no change to the script, it returns the NonMatchingSplitsSizesError

2) If I add _ignoreverifications=True to the loading_dataset, it returns another error, servicetimeout

And here's the commandline: MODEL=neulab/gpt2-finetuned-wikitext103 python -u run_clm.py --model_name_or_path ${MODEL} --dataset_name wikitext --dataset_config_name wikitext-103-raw-v1 --output_dir checkpoints/${MODEL} --do_eval --eval_subset validation

Jing-L97 commented 7 months ago

Hi! I have successfully solved the issue by reinstalling the required packages. Thank you very much!

urialon commented 7 months ago

Great to hear!