mbzuai-oryx / LLaVA-pp

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
813 stars 61 forks source link

'PreTrainedTokenizerFast' object has no attribute 'legacy' #22

Closed TuuSiwei closed 6 months ago

TuuSiwei commented 6 months ago

when i run llama3-finetune-lora script, i encounter the follow error:

Original Traceback (most recent call last): File "/x/sherlor/envs/llava/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/x/sherlor/envs/llava/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/x/sherlor/envs/llava/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/x/tsw/LLaVA/llava/train/train.py", line 828, in getitem data_dict = preprocess( File "/x/tsw/LLaVA/llava/train/train.py", line 699, in preprocess return preprocess_v1(sources, tokenizer, has_image=has_image) File "/x/tsw/LLaVA/llava/train/train.py", line 534, in preprocess_v1 if i != 0 and not tokenizer.legacy and IS_TOKENIZER_GREATER_THAN_0_14: AttributeError: 'PreTrainedTokenizerFast' object has no attribute 'legacy'

my version info is: transformers 4.41.0.dev0 tokenizers 0.19.1

ashmalvayani commented 6 months ago

What was the solution for this error?

TuuSiwei commented 5 months ago

What was the solution for this error?

I am so sorry,the issue is too old that i have forget the reason,maybe you should load tokenizer with use_fast or not...

ashmalvayani commented 5 months ago

What was the solution for this error?

I am so sorry,the issue is too old that i have forget the reason,maybe you should load tokenizer with use_fast or not...

I am currently using a cohere's model and when you set use_fast =True, it will throw a "CohereTokenizer does not exist or is not currently supported" error even with the latest version of the transformers so I think it's already set as True by default by AutoTokenizer.