transpect / docx2hub

Converts Microsoft docx to flat hub XML
BSD 2-Clause "Simplified" License
27 stars 15 forks source link

Sections #13

Open diakovidis opened 6 years ago

diakovidis commented 6 years ago

According to my understanding, docx sections (<w:sectPr>) are not currently supported. What would be the effort implementing this future?

gimsieke commented 6 years ago

I think we do process some of the page-related sectPr properties, namely the page dimensions, orientation, breaks, and footnote properties. In which properties are you particularly interested?

gimsieke commented 6 years ago

In terms of converting them into DocBook sections or the like, we are unlikely to support it in this docx2hub library. Splitting a document into sections will be done by another library, evolve-hub. You will find a sample setup in the docx2jats demo.

diakovidis commented 6 years ago

I am working with a document which has 3 sections. The 2nd one has a 2 column layout. <w:cols w:num="2" w:sep="1" w:space="284"/> . Trying to pass it through docx2hub and then into hub2docx, the resulting new docx contains one section for the whole document.

diakovidis commented 6 years ago

Doesn't evolve-hub require some kind of information for that splitting, which in this case is missing in the Flat hub?

gimsieke commented 6 years ago

I see. If we were to support columns, we need to change a couple of things:

If page sizes change within the document and if we need to convey the page size information to the Hub XML format, we need to apply divs to the resulting document anyway.

So yes, I acknowledge it’s a legitimate feature request if you need to convey the column information. In most of our workflows, we treat Word as a manuscript editing tool, where column counts and dimensions are negligible.

I think implementing this is a matter of a person day or so. But we wouldn’t prioritize it because we didn’t ever need to know the column count.

Of course if you need it urgently you can pay us for implementing it earlier, or set up some crowdfunding.

gimsieke commented 6 years ago

Doesn't evolve-hub require some kind of information for that splitting, which in this case is missing in the Flat hub?

Yes, that’s exactly the point: evolve-hub expects some configuration for hierarchization, while docx2hub is a step that is mostly configuration-free. As I said, we never saw any significant document-structuring information in column-count specifications, therefore we discarded it.