ibm-aur-nlp / PubTabNet

Other
380 stars 79 forks source link

Some empty cell labels are None, while others are spaces #19

Closed Antonio-hi closed 1 year ago

Antonio-hi commented 3 years ago

I have noticed that some empty cell labels are not same.

I have counted the whole training set, and found that the ratio of the both cases is 307594: 1203641

I hope to know if there is anything wrong with my understanding or the annotation have this ambiguity

EmperorKaiser commented 3 years ago

From my perspective, they have almost the same visualization effect. I think this is why there are two patterns to label the blank cells. And I counted the number of cell number( which in fact means the number of chunks) and the structure number (which in fact means the number of cells) and found they are the same. So I think if you would like to unify the representation of blank cells you can simply add a" " into those {'tokens': []}

Antonio-hi commented 1 year ago

close this work