I have trained a donut base model on our custom dataset, which consists of a total of 12,480 images. I then fine-tuned this base model with default parameters.
During the analysis of predictions, I observed certain patterns in the JSON output. Specifically, when similar keys appear almost simultaneously, the model tends to make the following types of errors:
It predicts extra characters (e.g., "Paneer cheese paratha with butter" is predicted as "Paneer Paneer cheese paratha with butter").
It misses some characters (e.g., "199.00" is predicted as "19.00").
It predicts incorrect characters (e.g., "119.00" is predicted as "159.00").
Additionally, I noticed that the model often predicts characters such as "5," "7," and "1," even though these characters are not present in the images.
In the below json, model misses in between characters, predicts something else other than ground truth or gives extra characters in prediction which are not there in image/json. The image is clean enough for a model to get proper predictions still it gets wrong predictions as mentioned above.
As per analysis, the model makes more mistakes in values(Numeric) than keys(Alphabetic), maybe the reason is data imbalancing.
In the JSON provided below, despite the clarity of the image, the model consistently exhibits several issues:
Missing Characters: The model frequently fails to recognize certain characters.
Duplicate Keys: It tends to predict the same type of key multiple times, resulting in an extra key, such as "Oil fluid," which is a combination of two adjacent keys.
Missing Colon (:) at the End of Keys: The model omits the colon character at the end of keys.
Missing Plus Sign (+) in Values: It also overlooks the plus sign in values.
In the below json, I have found the same pattern that sometimes model predict a character only one time even after that character there two times in the image. like; (‘@ @’, ‘: :’) then the model will predict it only once. Also predicts the same keys twice.
Hello,
I have trained a donut base model on our custom dataset, which consists of a total of 12,480 images. I then fine-tuned this base model with default parameters.
During the analysis of predictions, I observed certain patterns in the JSON output. Specifically, when similar keys appear almost simultaneously, the model tends to make the following types of errors:
It predicts extra characters (e.g., "Paneer cheese paratha with butter" is predicted as "Paneer Paneer cheese paratha with butter"). It misses some characters (e.g., "199.00" is predicted as "19.00"). It predicts incorrect characters (e.g., "119.00" is predicted as "159.00").
Additionally, I noticed that the model often predicts characters such as "5," "7," and "1," even though these characters are not present in the images.
Ground Truth:
{ "table": [ { "key": "Paneer paratha with butter", "value": "199.00" }, { "key": "Paneer cheese paratha with butter", "value": "119.00" } ] }
Prediction:
{ "table": [ { "key": "Paneer paratha with butter", "value": "19.00" }, { "key": "Paneer Paneer cheese paratha with butter", "value": "159.00" } ] }
In the below json, model misses in between characters, predicts something else other than ground truth or gives extra characters in prediction which are not there in image/json. The image is clean enough for a model to get proper predictions still it gets wrong predictions as mentioned above.
As per analysis, the model makes more mistakes in values(Numeric) than keys(Alphabetic), maybe the reason is data imbalancing.
Ground Truth:
{ "table": [ { "key": "Accessible Amount", "value": "9123.23" }, { "key": "Car parts due :", "value": "2,09,233.19" }, { "key": "Paint brushes :", "value": "200.00" } ] }
Predicted:
{ "table": [ { "key": "Accesible Amount", "value": "9123.33" }, { "key": "Car parts due :", "value": "9,1,233.19" }, { "key": "Paint brushes :", "value": "200.000" } ] }
In the JSON provided below, despite the clarity of the image, the model consistently exhibits several issues:
Missing Characters: The model frequently fails to recognize certain characters. Duplicate Keys: It tends to predict the same type of key multiple times, resulting in an extra key, such as "Oil fluid," which is a combination of two adjacent keys. Missing Colon (:) at the End of Keys: The model omits the colon character at the end of keys. Missing Plus Sign (+) in Values: It also overlooks the plus sign in values.
Ground Truth :
{ "table": [ { "key": "Delivery charges :", "value": "(+)470.00" }, { "key": "Oil charge:", "value": "3,120.00" }, { "key": "Washer fluid :", "value": "3,120.00" } ] }
Predicted:
{ "table": [ { "key": "Delivery charges", "value": "( )470.00" }, { "key": "Oil charge:", "value": "3,120.00" }, { "key": "Oil fluid :", "value": "157.00" }, { "key": "Washer fluid :", "value": "3,120.00" } ] }
In the below json, I have found the same pattern that sometimes model predict a character only one time even after that character there two times in the image. like; (‘@ @’, ‘: :’) then the model will predict it only once. Also predicts the same keys twice.
Ground Truth:
{ "table": [ { "key": "Transport charges::", "value": "144.00" }, { "key": "Freight charges", "value": "" }, { "key": "Washer fluid @ @ 18 %", "value": "3,120.00" } ] }
Prediction:
{ "table": [ { "key": "Transport charges:", "value": "144.00" }, { "key": "Freight charges:", "value": "" }, { "key": "Freight charges:", "value": "" }, { "key": "Freight charges:", "value": "" }, { "key": "Washer fluid @ 18 %", "value": "3,120.00" } ] }