TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'tokenize_newline_separately'

DigitalPathology commented 1 month ago

I would like to conduct object detection task by utilizing a VQA model using autotrain API. I followed this guide. Accordingly, I prepared the metadata.json properly. Three columns are "file_name", "question", "multiple_choice_answer". Sample format from the dataset:

{"file_name": "1.mrxs__12214_50922_512_512.png", "question": "This image is from 3DHistech Scanner. Where is the mitosis location(four properties of the bounding box: top left x coordinate, top left y coordinate, width, height) in this image?", "multiple_choice_answer": [[181, 199, 43, 42]]}

I tried to use google/paligemma-3b-ft-coco35l-448 and google/paligemma-3b-mix-448 models for this purpose. When I start the process with this command: autotrain --config config.yml

It loads the dataset properly. Everthing seems fine until the training started:

Here is the error:

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open for 30 days with no activity.

cipherexx commented 1 day ago

still an issue

huggingface / autotrain-advanced

TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'tokenize_newline_separately' #791