python-openxml / python-docx

Create and modify Word documents with Python
MIT License
4.5k stars 1.11k forks source link

inserting xml-snippet into docx using the python-docx api #55

Closed ghost closed 10 years ago

ghost commented 10 years ago

We need to change header text-orientation of tables. We are aware that this may not be possible with the current state of the API. We identify the xml-snippet to be inserted using opc-diag as suggested elsewhere. Can we use xml-snippet insertion to achieve this? If yes what is the API-command to do the xml-insertion at a specific point of the docx? -- sub

scanny commented 10 years ago

Hi Sub, can you give me an idea what the snippet you need to insert would look like? There are a couple different possible approaches. The best choice probably depends mostly on the size of the snippet, but if you could provide some example XML, like we want to insert something like: this in the document roughly: here. That would help me see what's likely to be the best way.

Also, if you could mention how you plan to identify the right place, that would be a help, like what object you'll have a reference to as a starting point.

ghost commented 10 years ago

Hi Scanny, we would need such a feature to use it as a workaround for the following four requirements if not available in a simpler way with the API:

We are still studying how to specify the location and also how to insert the snippet. If you can provide some hints to the above. — sub

On 22 May 2014, at 10:46, scanny notifications@github.com wrote:

Hi Sub, can you give me an idea what the snippet you need to insert would look like? There are a couple different possible approaches. The best choice probably depends mostly on the size of the snippet, but if you could provide some example XML, like we want to insert something like: this in the document roughly: here. That would help me see what's likely to be the best way.

Also, if you could mention how you plan to identify the right place, that would be a help, like what object you'll have a reference to as a starting point.

— Reply to this email directly or view it on GitHub.

scanny commented 10 years ago

These would be four separate jobs, each requiring its own method, although the same approach would probably work for all four.

In general I'd say these are small enough that inserting the elements one at a time using the lxml API would be the best approach.

A good approach is to do a before and after diff using opc-diag (http://opc-diag.readthedocs.org/). You create the simplest possible file containing the "before" situation, like a 2x2 table with regular orientation headers, perhaps saved as "before.docx". Then you make the change you want using Word and save it again as "after.docx". Then you diff the two files with opc-diag with something like:

$ opc diff-item before.docx after.docx document.xml

This should narrow right down what XML changes need to be made.

If you can send me one of those diffs for, say, header text orientation, I can give you an example code snippet to get you going.

ghost commented 10 years ago

as requested for table-header-orientation: I am sending you only the opc-diag diffs found for document.xml and ignoring diffs found for core.xml , app.xml & settings.xml.

— sub

@@ -21,21 +21,27 @@

     <w:gridCol w:w="4893"/>
     <w:gridCol w:w="4893"/>
   </w:tblGrid>

@@ -56,7 +62,7 @@

     </w:tc>
   </w:tr>
 </w:tbl>
scanny commented 10 years ago

Ok, good, this narrows it right down.

First thing is we can ignore all the changes to w:rsid.. attributes, those are part of the revision tracking mechanism and just indicate this is a new revision.

The operative changes are:

I give an example here of the textDirection element since that seems to be the key one:

from docx.oxml.shared import OxmlElement, qn

def set_vert_cell_direction(cell):
    tc = cell._tc
    tcPr = tc.tcPr
    textDirection = OxmlElement('w:textDirection')
    textDirection.set(qn('w:val'), 'btLr')
    tcPr.append(textDirection)

cell._tc is the internal reference to the docx.oxml.table.CT_Tc instance containing the <w:tc> element. tc.tcPr is its child tcPr element. OxmlElement creates a new element from a tagname and the set method on it sets an attribute. append() on an element adds another element as the last child.

Does that give you enough to go on?

ghost commented 10 years ago

trying it out... On 22 May 2014, at 22:18, scanny notifications@github.com wrote:

Ok, good, this narrows it right down.

First thing is we can ignore all the changes to w:rsid.. attributes, those are part of the revision tracking mechanism and just indicate this is a new revision.

The operative changes are:

adding a element to the row properties adding a element to the cell properties adding indentation to the paragraph properties I give an example here of the textDirection element since that seems to be the key one:

from docx.oxml.shared import OxmlElement, qn

def set_vert_cell_direction(cell): tc = cell._tc tcPr = tc.tcPr textDirection = OxmlElement('w:textDirection') textDirection.set(qn('w:val'), 'btLr') tcPr.append(textDirection) cell._tc is the internal reference to the docx.oxml.table.CT_Tc instance containing the element. tc.tcPr is its child tcPr element. OxmlElement creates a new element from a tagname and the set method on it sets an attribute. append() on an element adds another element as the last child.

Does that give you enough to go on?

— Reply to this email directly or view it on GitHub.

ghost commented 10 years ago

i have been able to achieve using your hints:

however i face some challenges for: a) straddling multiple columns within table b) mid-document page orientation c) yet to write code to set header row-height

we would appreciate your help to points a) and b) — sub

Begin forwarded message:

From: SubRegi subregi@gmail.com Subject: Re: [python-docx] inserting xml-snippet into docx using the python-docx api (#55) Date: 23 May 2014 09:30:10 GMT+5:30 To: python-openxml/python-docx reply@reply.github.com Cc: python-openxml/python-docx python-docx@noreply.github.com

trying it out... On 22 May 2014, at 22:18, scanny notifications@github.com wrote:

Ok, good, this narrows it right down.

First thing is we can ignore all the changes to w:rsid.. attributes, those are part of the revision tracking mechanism and just indicate this is a new revision.

The operative changes are:

adding a element to the row properties adding a element to the cell properties adding indentation to the paragraph properties I give an example here of the textDirection element since that seems to be the key one:

from docx.oxml.shared import OxmlElement, qn

def set_vert_cell_direction(cell): tc = cell._tc tcPr = tc.tcPr textDirection = OxmlElement('w:textDirection') textDirection.set(qn('w:val'), 'btLr') tcPr.append(textDirection) cell._tc is the internal reference to the docx.oxml.table.CT_Tc instance containing the element. tc.tcPr is its child tcPr element. OxmlElement creates a new element from a tagname and the set method on it sets an attribute. append() on an element adds another element as the last child.

Does that give you enough to go on?

— Reply to this email directly or view it on GitHub.

scanny commented 10 years ago

Pick one and post the diff and I'll see what I can offer in the way of advice :)

ghost commented 10 years ago

unable to find a workaround for b) mid-document page orientation change, thanks — sub

On 24 May 2014, at 00:38, scanny notifications@github.com wrote:

Pick one and post the diff and I'll see what I can offer in the way of advice :)

— Reply to this email directly or view it on GitHub.

scanny commented 10 years ago

I need to see the diff out of opc-diag like you sent for the first one.

ghost commented 10 years ago

oops! here they are:

landscape2portrait:

@@ -22,8 +22,8 @@

   </w:pPr>
 </w:p>
 <w:p w:rsidR="00936EEB" w:rsidRDefault="00936EEB"/>

portrait2landscape:

@@ -20,9 +20,13 @@

       <w:docGrid w:linePitch="360"/>
     </w:sectPr>
   </w:pPr>

On 24 May 2014, at 08:58, scanny notifications@github.com wrote:

I need to see the diff out of opc-diag like you sent for the first one.

— Reply to this email directly or view it on GitHub.

scanny commented 10 years ago

You can ignore the w:rsid.. attributes, they're part of the revision tracking scheme. You can also ignore the proofErr bits, those are the red squiggly lines under spelling errors. The w:pgSz element is the one you want.

If there is only one section throughout the document, e.g. it is all portrait or all landscape, then you'll find the w:sectPr parent element as the last child of <w:document><w:body>. If there are other section breaks, they are in the pPr element of the last paragraph in the section.

You can access the "sentinel" <w:sectPr> element using the following internals if you like:

sectPr = document._document_part._element.body._sentinel_sectPr

There is a little more on sections here: http://python-docx.readthedocs.org/en/latest/dev/analysis/features/sections.html

ghost commented 10 years ago

thanks, will try…

On 25 May 2014, at 01:32, scanny notifications@github.com wrote:

You can ignore the w:rsid.. attributes, they're part of the revision tracking scheme. You can also ignore the proofErr bits, those are the red squiggly lines under spelling errors. The w:pgSz element is the one you want.

If there is only one section throughout the document, e.g. it is all portrait or all landscape, then you'll find the w:sectPr parent element as the last child of . If there are other section breaks, they are in the pPr element of the last paragraph in the section.

You can access the "sentinel" element using the following internals if you like:

sectPr = document.document_part._element.body._sentinel_sectPr There is a little more on sections here: http://python-docx.readthedocs.org/en/latest/dev/analysis/features/sections.html

— Reply to this email directly or view it on GitHub.

ghost commented 10 years ago

found the solution using your tips and thanks a lot from previous wish-list, still lost with column straddling, much appreciate your help. — sub

@@ -23,33 +23,22 @@

         <w:gridCol w:w="2371"/>
         <w:gridCol w:w="2371"/>
       </w:tblGrid>
-      <w:tr w:rsidR="007C1D62" w:rsidTr="007C1D62">
+      <w:tr w:rsidR="006B42ED" w:rsidTr="00633BDB">
         <w:trPr>
           <w:trHeight w:val="1015"/>
         </w:trPr>
         <w:tc>
           <w:tcPr>
-            <w:tcW w:w="2371" w:type="dxa"/>
+            <w:tcW w:w="7113" w:type="dxa"/>
+            <w:gridSpan w:val="3"/>
           </w:tcPr>
-          <w:p w:rsidR="007C1D62" w:rsidRDefault="007C1D62"/>
+          <w:p w:rsidR="006B42ED" w:rsidRDefault="006B42ED"/>
         </w:tc>
         <w:tc>
           <w:tcPr>
             <w:tcW w:w="2371" w:type="dxa"/>
           </w:tcPr>
-          <w:p w:rsidR="007C1D62" w:rsidRDefault="007C1D62"/>
-        </w:tc>
-        <w:tc>
-          <w:tcPr>
-            <w:tcW w:w="2371" w:type="dxa"/>
-          </w:tcPr>
-          <w:p w:rsidR="007C1D62" w:rsidRDefault="007C1D62"/>
-        </w:tc>
-        <w:tc>
-          <w:tcPr>
-            <w:tcW w:w="2371" w:type="dxa"/>
-          </w:tcPr>
-          <w:p w:rsidR="007C1D62" w:rsidRDefault="007C1D62"/>
+          <w:p w:rsidR="006B42ED" w:rsidRDefault="006B42ED"/>
         </w:tc>
       </w:tr>
       <w:tr w:rsidR="007C1D62" w:rsidTr="007C1D62">
@@ -111,7 +100,7 @@

         </w:tc>
       </w:tr>
     </w:tbl>
-    <w:p w:rsidR="00000000" w:rsidRDefault="007C1D62"/>
+    <w:p w:rsidR="00000000" w:rsidRDefault="006B42ED"/>
     <w:sectPr w:rsidR="00000000">
       <w:pgSz w:w="12240" w:h="15840"/>
       <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/>

On 25 May 2014, at 01:32, scanny notifications@github.com wrote:

You can ignore the w:rsid.. attributes, they're part of the revision tracking scheme. You can also ignore the proofErr bits, those are the red squiggly lines under spelling errors. The w:pgSz element is the one you want.

If there is only one section throughout the document, e.g. it is all portrait or all landscape, then you'll find the w:sectPr parent element as the last child of . If there are other section breaks, they are in the pPr element of the last paragraph in the section.

You can access the "sentinel" element using the following internals if you like:

sectPr = document.document_part._element.body._sentinel_sectPr There is a little more on sections here: http://python-docx.readthedocs.org/en/latest/dev/analysis/features/sections.html

— Reply to this email directly or view it on GitHub.

trampas commented 10 years ago

I was trying to change text direction on a cell using the code above: def set_vert_cell_direction(cell): tc = cell._tc print "TC is " print tc tcPr = tc.tcPr print "TcPr is " print tcPr textDirection = OxmlElement('w:textDirection') textDirection.set(qn('w:val'), 'btLr') tcPr.append(textDirection)

I get the error: TC is <Element {http://schemas.openxmlformats.org/wordprocessingml/2006/main}tc at 0x4ece030> TcPr is None Traceback (most recent call last): File "C:\Projects\Metrology\iPERL\python\testing\wr7_csv_parse.py", line 454, in AddTablesToDoc(files,document) File "C:\Projects\Metrology\iPERL\python\testing\wr7_csv_parse.py", line 333, in AddTablesToDoc set_vert_cell_direction(row_cells[0]) File "C:\Projects\Metrology\iPERL\python\testing\wr7_csv_parse.py", line 25, in set_vert_cell_direction tcPr.append(textDirection) AttributeError: 'NoneType' object has no attribute 'append'

Am I missing something?

scanny commented 10 years ago

@trampas the tcPr (table cell properties) child element is optional. When it's not present, tc.tcPr returns None. If you get None, you'll need to add a tcPr element before you can add a <w:textDirection> child to it.

Most of the relevant XML schema definitions are here: http://python-docx.readthedocs.org/en/latest/dev/analysis/features/table.html

You can get it in the right place with something like this:

...
tcPr = tc.tcPr
if tcPr is None:
    tcPr = OxmlElement('w:tcPr')
    tc.insert(0, tcPr)
...

Note that the ordering of child elements within tcPr is significant, so just appending a textDirection element might cause a "repair-step" error on document load if there's already one or more child elements within the tcPr element.

scanny commented 10 years ago

@subregi I think you got yours working, right? If not feel free to reopen, closing for now.

trampas commented 10 years ago

Thank you!

I got it working!

I also made ability to merge cells and bold text in a cell. My next task is changing column widths and font colors.

Thanks Trampas

On Thu, May 29, 2014 at 12:16 AM, scanny notifications@github.com wrote:

@trampas https://github.com/trampas the tcPr (table cell properties) child element is optional. When it's not present, tc.tcPr returns None. If you get None, you'll need to add a tcPr element before you can add a

child to it. Most of the relevant XML schema definitions are here: http://python-docx.readthedocs.org/en/latest/dev/analysis/features/table.html You can get it in the right place with something like this: ...tcPr = tc.tcPrif tcPr is None: tcPr = OxmlElement('w:tcPr') tc.insert(0, tcPr)... Note that the ordering of child elements within tcPr is significant, so just appending a textDirection element might cause a "repair-step" error on document load if there's already one or more child elements within the tcPr element. — Reply to this email directly or view it on GitHub https://github.com/python-openxml/python-docx/issues/55#issuecomment-44492860 .
scanny commented 10 years ago

Glad to hear it Trampas :)

trampas commented 9 years ago

I have just upgraded python-docx and the following code no longer works.

   hdr_cells = table.rows[0].cells
    print hdr_cells
    hdr_cells[0].text = title
    set_merge(hdr_cells[0],nCols)
    for ki in range(nCols-1):
            if len(hdr_cells)>1:
                n=hdr_cells.__getitem__(1)._tc
                if (n!=None):
                    hdr_cells._tr.remove(n)

Specifically I get the following error: hdr_cells._tr.remove(n) AttributeError: 'tuple' object has no attribute '_tr'

I use the _tr for accessing XML for the row in several places and was wondering if there was a better way with new code?

For example I set the row height like: def set_row_height(row, height): tr = row._tr trPr = tr.find(qn('w:trPr')); if trPr==None: x=OxmlElement('w:trPr') tr.append(x); trPr = tr.find(qn('w:trPr')); textDirection = OxmlElement('w:trHeight') textDirection.set(qn('w:val'), str(height)) trPr.append(textDirection)

Thanks Trampas

On Thu, May 29, 2014 at 3:13 PM, scanny notifications@github.com wrote:

Glad to hear it Trampas :)

— Reply to this email directly or view it on GitHub https://github.com/python-openxml/python-docx/issues/55#issuecomment-44572163 .