Max_seq_length LayoutLMV3 - How to implement it ?

davelza95 commented 1 year ago

Hi there,

the model I am using is LayoutLMV3 (LayoutLMv3ForTokenClassification).

I want that the model can take more than 512 tokens because when the text is very large it does not classify the rest of it.

I want to increase the seq_max_length, I have changed from LayoutLMv3Config -> max_position_embeddings = 1024, the bboxes dataset that I am using has a 1024 + 196 + 1 = 1221 size, but this hasn't worked.

I got a CUDA error:

.../python3.8/layoutlm_model-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   2197         # remove once script supports set_grad_enabled
   2198         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2199     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

RuntimeError: CUDA error: device-side assert triggered

This should be a matrix sizes error.

Can someone explain me, how can I increase the max_seq_length please.

superxii commented 1 year ago

Is the problem fixed? I encountered same issue.

arvindrajan92 commented 1 year ago

Hi @superxii and @davelza95,

I got it done using the implementation on huggingface library. These are the parameters that require tweaking to get LayoutLMv3 to work for a longer sequence length:

processor.tokenizer.model_max_length
config.max_position_embeddings

My guess is the input size is not matching the expected size. I found that changing to cpu gives a better error message. If you can get it to train on cpu, then errors on cuda mostly get attributed to OOM errors, in which case you can reduce your batch size.

Another point I would like to share is: for better generalisation of the LayoutLMv3 model, you might have to pre-train the model. This is because, by changing the sequence length, the text and image path alignment may not be right. You may realise the model trains with expected performance, but from testing, it doesn't generalise to out of distribution dataset, which I think is due to the said issue.

Hope this has been helpful. Feel free to share your progress :)

superxii commented 1 year ago

I have also fixed this problem by setting this: encoding = processor(image, return_offsets_mapping=True, return_tensors="pt", truncation=True, max_length=512)

The error for this is because when the token's length of the model is not same with the input.Mostly the input token's length is greater than the model one. So by setting the max_length and trucation will solve this problem.|

Thank you for providing your solution :)

alitavanaali commented 1 year ago

Hi all! I have explained my solution to handle large tokens here, hope it can help you:

https://github.com/huggingface/transformers/issues/19190#issuecomment-1441883471

microsoft / unilm

Max_seq_length LayoutLMV3 - How to implement it ? #942