merveenoyan / smol-vision

Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
Apache License 2.0
914 stars 88 forks source link

TypeError: _batch_encode_plus() got an unexpected keyword argument 'tokenize_newline_separately' while finetuning PaliGemma #18

Open Aritra02091998 opened 1 day ago

Aritra02091998 commented 1 day ago

I'm facing this error TypeError: _batch_encode_plus() got an unexpected keyword argument 'tokenize_newline_separately' while finetuning the paligemma, using the given notebook file.

Specifically while running the training using:

from transformers import TrainingArguments args=TrainingArguments( num_train_epochs=2, remove_unused_columns=False, per_device_train_batch_size=4, gradient_accumulation_steps=4, warmup_steps=2, learning_rate=2e-5, weight_decay=1e-6, adam_beta2=0.999, logging_steps=100, optim="adamw_hf", save_strategy="steps", save_steps=1000, save_total_limit=1, output_dir="paligemma_vqav2", bf16=True, report_to=["tensorboard"], dataloader_pin_memory=False )

from transformers import Trainer

trainer = Trainer( model=model, train_dataset=train_ds , data_collator=collate_fn, args=args )

trainer.train()

Aritra02091998 commented 1 day ago

Using:

tokenizers 0.20.1 py39ha92566c_1 transformers 4.46.3 pyhd8ed1ab_0