The model may overfit the text sorting method, resulting in ineffective use of geometric and morphological information

wenwenyu / PICK-pytorch

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)

https://arxiv.org/abs/2004.07464

MIT License

553 stars 191 forks source link

The model may overfit the text sorting method, resulting in ineffective use of geometric and morphological information #46

Open phybrain opened 3 years ago

tengerye commented 3 years ago

Hi @phybrain , would you please give us more details about your thought?

wenwenyu commented 3 years ago

Hi, thank you for your insight and problem.

We have tried to do experiments on the unsorted text, and the performance didn't drop.

The main aim of the text sorting method is to prevent truncation operation (MAX_BOXES_NUM) from deleting useful information when the area of the top-left document has our interested entity in some situation.

phybrain commented 3 years ago

Thank you for your reply, the score drop a little when i trained on unsorted text。However, from the actual effect, it is not satisfying，and worse than sorted especially the digital type. It may be because there are a lot of digital types in my entity types，and digital types are uniform distribution.