Token indices sequence length is longer than the specified maximum sequence length

Hi, thank you for your excellent work. I meet a problem and really need your help:

When I conducted the step 1: Generation by running the following code:

accelerate launch --main_process_port=2950 spin/generate.py --input_dir /SPIN_iter0 --batch_size 8 --frac_len 800 --data_frac 2 --output_dir /generated/iter1

It returns a warning:


Token indices sequence length is longer than the specified maximum sequence length for this model (2392 > 2048). Running this sequence through the model will result in indexing errors.

I am wondering whether this will affect the following training step? Is there any way to address this issue? Or can I just ignore it?

Thanks in advance!

uclaml / SPIN

Token indices sequence length is longer than the specified maximum sequence length #8