Open BangdongChen opened 3 years ago
There is a possibility of some problems with the code. (I'll advise you to use the version 1.0 of package truthpy, as there were some changes made to truthpy after it which were not backwards compatible). Apart from that, please share the XML file over here and I can have a look into it.
I have attached an example ground truth file which can be used. As github did not allow me to upload XML so I changed the extension to txt. Do let me know if that clears up the issue. us-005_0_0.txt
i want to know how to generate the corresponding xml file if i use the pubtablenet dataset
@andyjpaddle I don't think you can convert the annotations for pubtablenet to the xml files required by this repo. I believe pubtablenet only has HTML annotations, i.e. there is no annotation regarding locations of the rows and columns in the image (which is needed for Tab-Aug). The only solution would be to manually annotate the images using T-Truth.
most of table recognition datasets are labeled with text bbox and structure tokens, however TabAug need the cell bbox, in other words it can not be applied. i wonder is there a plan to support public dataset (eg, pubtabnet)?
Hi, in line 98 of _generatesamples.py, I found that doc is nothing after _Document(xmlfile), is there something wrong with the format of the xml files? Can you give a xml file as an example?