lquirosd / P2PaLA

Page to PAGE Layout Analysis Tool
GNU General Public License v3.0
191 stars 42 forks source link

XML generator #29

Closed EvertonTomalok closed 3 years ago

EvertonTomalok commented 5 years ago

Hello.

How can I generate new png XML? Do you use any tool to handle it?

lquirosd commented 5 years ago

Hi, Do you mean PAGE-XML?

EvertonTomalok commented 5 years ago

Exactly!

Because it'll be the input data, or do I talking nonsense things? Haha

I'm used labeling using this tool: https://github.com/tzutalin/labelImg

But I don't know if it's compatible with P2Pala.

Furthermore, congrats for your code... I was developing something very similar, but I think P2PaLa will help me a lot.

EvertonTomalok commented 5 years ago

Never mind brother... I read again the doc, and I found it:

We recommend Transkribus or nw-page-editor to visualize and edit PAGE-xml files.

EvertonTomalok commented 5 years ago

Is it possible to label as a paragraph instead of a line?

lquirosd commented 5 years ago

Hi, A line (aka TetxLine, Baseline) is a general element that describes where the text is placed. On the other hand we have a different hierarchic level called a "region" (aka TextRegion), this object is designed to describe some properties of the data, for example: a paragraph is a TextRegion that encompasses one or more TextLines all of them with the type "paragraph". So you can label your TextRegion using whatever "type" (paragraph, marginalia, header ...) do you what (using the custom->structure field on the PAGE-XML), and assign each TextLine to the corresponding TextRegion. For more details about PAGE-XML format please check this paper