I am currently fine-tuning the LILT model on my dataset, which includes labels for various components such as headings, subheadings, text, tables, table headings, images, and captions. However, during tokenization, I encountered issues with images and tables. To resolve this, I assigned a random word for tokenization for all tables and images. However, after training the model, it does not classify any tables or images.
I am confused if I should switch to a different tokenizer from LayoutLMv3 or if there are other steps I can take to address this issue. Additionally, I am wondering to know if there are any other tokenizers that would be suitable for my dataset.
I am currently fine-tuning the LILT model on my dataset, which includes labels for various components such as headings, subheadings, text, tables, table headings, images, and captions. However, during tokenization, I encountered issues with images and tables. To resolve this, I assigned a random word for tokenization for all tables and images. However, after training the model, it does not classify any tables or images.
I am confused if I should switch to a different tokenizer from LayoutLMv3 or if there are other steps I can take to address this issue. Additionally, I am wondering to know if there are any other tokenizers that would be suitable for my dataset.