uclaml / SPIN

The official implementation of Self-Play Fine-Tuning (SPIN)
https://uclaml.github.io/SPIN/
Apache License 2.0
1.05k stars 92 forks source link

Token indices sequence length is longer than the specified maximum sequence length #8

Closed yurunsheng1 closed 9 months ago

yurunsheng1 commented 9 months ago

Hi, thank you for your excellent work. I meet a problem and really need your help:

When I conducted the step 1: Generation by running the following code:

accelerate launch --main_process_port=2950 spin/generate.py --input_dir /SPIN_iter0 --batch_size 8 --frac_len 800 --data_frac 2 --output_dir /generated/iter1

It returns a warning:


Token indices sequence length is longer than the specified maximum sequence length for this model (2392 > 2048). Running this sequence through the model will result in indexing errors.

I am wondering whether this will affect the following training step? Is there any way to address this issue? Or can I just ignore it?

Thanks in advance!

yihedeng9 commented 9 months ago

Hi, thank you for your interest! The warning will not affect the training step and can be ignored.