monniert / docExtractor

(ICFHR 2020 oral) Code for "docExtractor: An off-the-shelf historical document element extraction" paper
https://www.tmonnier.com/docExtractor
MIT License
85 stars 10 forks source link

via_converter.py generate with boarders #13

Closed seekingdeep closed 3 years ago

seekingdeep commented 3 years ago

@monniert i generated the groundtruth masks using the via_converter.py script that you included, they are created without the boarders, as when i used Via Annotator i was boxing the textlines. i think it would be better to add the option to generate with boarders when using via_converter.py.

monniert commented 3 years ago

good idea, I will see if I can add this in the current script in the upcoming days

seekingdeep commented 3 years ago

@monniert you might be right, my issue was that i was generating the document image masks with texline-boxes which caused lines to merge with one another. i will try to create a sample with x-height and see the results. shall i succeed, then we see how to solve such issue.

seekingdeep commented 3 years ago

@monniert alright i have tested with x-height only and still can't get it to work. have a look at the example

Now i will test with xheight+ border, but how should i label the regions when i use via annotator? should i create a new region with drop down type, and then create the options of text and text_border?

upload an example of a labeled .json file to clarify how to annotate.

waiting for your reply