Hi, I have fine-tuned LayoutLM (v1) on my own invoice data. The model, after 4 epochs, reaches a pretty good performance.
When using it for inference, though, I get different outputs depending on the order of input_ids and bbox tensors in the encoding. The difference I observe is mostly based on the order of other words versus words with any label except other. There is three different orderings I have tested:
first all words/boxes semantic interesting boxes (i.e. boxes are expected to be classified with any label except other), then all other boxes
random order of labeled other versus non-other words/boxes
words/boxes ordered by the box position, top left to bottom right
When I run the inference, the model yields the following predictions (I did only visualize the boxes that have non-other labels):
First non-other boxes/words, then other boxes/words
Random order
Top left to bottom right order
Case 1 matches the ground truth the most... However, the difference in results between the cases is not what I expected... I expect the same results for all these cases, i.e. that the result is independent of how words/boxes in the encoding for inference are ordered.
If the word/box order is relevant, what is the correct order for training and for inference?
Do you think it is beneficial for getting order-independent inference results to shuffle the word order for each training sample?
If useful, I can provide the encoding and fine-tuned model.
Hi, I have fine-tuned LayoutLM (v1) on my own invoice data. The model, after 4 epochs, reaches a pretty good performance.
When using it for inference, though, I get different outputs depending on the order of
input_ids
andbbox
tensors in the encoding. The difference I observe is mostly based on the order ofother
words versus words with any label exceptother
. There is three different orderings I have tested:other
), then allother
boxesother
versus non-other
words/boxesWhen I run the inference, the model yields the following predictions (I did only visualize the boxes that have non-other labels):
First non-other boxes/words, then other boxes/words
Random order
Top left to bottom right order
Case 1 matches the ground truth the most... However, the difference in results between the cases is not what I expected... I expect the same results for all these cases, i.e. that the result is independent of how words/boxes in the encoding for inference are ordered.
If the word/box order is relevant, what is the correct order for training and for inference?
Do you think it is beneficial for getting order-independent inference results to shuffle the word order for each training sample?
If useful, I can provide the encoding and fine-tuned model.