ibm-aur-nlp / PubTabNet

Other
380 stars 79 forks source link

image type #17

Open minouei-kl opened 3 years ago

minouei-kl commented 3 years ago

As mentioned in the paper, the results reported in 3 categories: simple, complex, and all. and also in the sample_gt.json there is a "type" property to show whether is simple or complex. but this property is missing in the training and validation set. is there a way to distinguish between the two in the training set?

ajjimeno commented 3 years ago

We did not generate such information for the training and validation sets. The criteria to label tables as simple or complex is based on tables without or with multi-rows/columns (https://arxiv.org/pdf/1911.10683.pdf), so it should be possible to generate it in the training and validations sets.

On Mon, Nov 16, 2020 at 8:30 PM minouei-kl notifications@github.com wrote:

As mentioned in the paper, the results reported in 3 categories: simple, complex, and all. and also in the sample_gt.json https://github.com/ibm-aur-nlp/PubTabNet/blob/7d9fbe9f63985d9d62adc7f396397fba9c81eef2/src/sample_gt.json#L1 there is a "type" property to show whether is simple or complex. but this property is missing in the training and validation set. is there a way to distinguish between the two in the training set?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ibm-aur-nlp/PubTabNet/issues/17, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6BZDIIBHIXTK4WEB7R2XLSQDWMLANCNFSM4TW5MKTQ .