mwilliamson / python-mammoth

Convert Word documents (.docx files) to HTML
BSD 2-Clause "Simplified" License
810 stars 121 forks source link

Grouping of elements? #129

Closed zopyx closed 1 year ago

zopyx commented 1 year ago

We have some documents with address lists where each entity of an address has its own style AuthorAddress, AuthorName, AuthorFax etc.

Bildschirm­foto 2023-01-21 um 09 39 37

They are properly converted into

<div class="authorname">...</div>
<div class="authoraddress">...</div>
<div class="authorfax">...</div>
<div class="authorname">...</div>
<div class="authoraddress">...</div>
<div class="authorfax">...</div>

All fine...is it possible to group all elements into an outer container like

<div class="author">
  <div class="authorname">...</div>
  <div class="authoraddress">...</div> 
  <div class="authorfax">...</div>
</div>
<div class="author">
  <div class="authorname">...</div>
  <div class="authoraddress">...</div> 
  <div class="authorfax">...</div>
</div>
mwilliamson commented 1 year ago

You'd probably want something like:

p[style-name='Author Name'] => div.author:fresh > div.authorname:fresh
p[style-name='Author Address'] => div.author > div.authoraddress:fresh
p[style-name='Author Address'] => div.author > div.authorfax:fresh
zopyx commented 1 year ago

Thank you! You saved my day!