python-openxml / python-docx

Create and modify Word documents with Python
MIT License
4.66k stars 1.14k forks source link

feature: Row.allow_break_across_pages #245

Open AlbinoShadow opened 8 years ago

AlbinoShadow commented 8 years ago

I'll try to keep it sweet and simple... There is a variable, paragraph.paragraph_format.keep_together, when it's true it will keep the paragraph on a single page instead of splitting it if it's at the end of the page.

I tried using this in a table scenario and it didn't work, although Word has the capability. After comparing the document.xml between a document where a row can split pages and one that cannot, I found that it acts as a table row property instead of a paragraph property (cantSplit vs keepLines in XML).

I can't seem to find any info regarding this within python-docx so I may make my own docx function for it, but I've got very little XML knowledge so I'd like to avoid that.

References for @scanny if he sees this:

<w:tr w:rsidR="00137A85" w:rsidTr="005B6BA3">
        <w:trPr>
          <w:cantSplit/>
        </w:trPr>

That's the XML with the row not splitting pages and this is without:

<w:tr w:rsidR="00137A85">

Obvious difference is the fact there's no table row property section and also the initial table row definition is different. If I end up making my own XML edits for this I'll make sure to post them.

AlbinoShadow commented 8 years ago

Fixed the issue following the information in this thread: https://github.com/python-openxml/python-docx/issues/55#issuecomment-43914055

Thanks for the tool @scanny it's extremely helpful =)

scanny commented 8 years ago

Reopening as feature request for Table.allow_break_across_pages. Thanks for this @AlbinoShadow :)

AlbinoShadow commented 8 years ago

@scanny here's the code that I ended up using to fix the issue:

from docx.oxml.shared import OxmlElement, qn # Necessary Import

def preventDocumentBreak(document):
  tags = document.element.xpath('//w:tr')
  rows = len(tags)
  for row in range(0,rows):
    tag = tags[row]                     # Specify which <w:r> tag you want
    child = OxmlElement('w:cantSplit')  # Create arbitrary tag
    tag.append(child)                   # Append in the new tag

I only had a single table in my document so I just applied it to every cell I believe. It's some edited code I found in another one of your comments I believe, but figured it doesn't hurt to post it. Thanks again, python-docx has made a huge difference in my job and is the reason I learned Python.

scanny commented 8 years ago

Super, thanks Joe :)

linuxkd commented 4 years ago

For those people that are looking to do the opposite and allow the row to go over multiple pages.

def allowDocumentBreak(document):
    """Allow table rows to break across pages."""
    tags = document.element.xpath("//w:tr")
    rows = len(tags)
    for row in range(0, rows):
        tag = tags[row]  # Specify which <w:r> tag you want
        child = OxmlElement("w:cantSplit")  # Create arbitrary tag
        child.set(qn("w:val"), "0")
        tag.append(child)  # Append in the new tag
vlad-belogrudov commented 2 years ago

looks like in the current format you have to have trPr tag (table row properties) for such tr. In the trPr you can specify property OxmlElement("w:cantSplit").

Rather life-hack (since it's internal api), to set "no-break" for a row:

row = table.add_row()
trPr = row._tr.get_or_add_trPr()
trPr.append(OxmlElement('w:cantSplit'))
muhammadahmadazhar commented 1 year ago

it fixed by trPr = OxmlElement('w:trPr') cantSplit = OxmlElement('w:cantSplit') cantSplit.set(qn('w:val'), 'true')

trPr.append(cantSplit) row._tr.append(trPr)

1krishnasharma commented 1 year ago

how to keep two rows of the table together. I have seen solution of splitting row. But if I want to keep two rows together such that if page end they should be in one page together.

image

like you can see in the image, how can i keep these two rows together in next page or previous page. anyone can help me with that?

scanny commented 1 year ago

@1krishnasharma the solutions here should work for you.

A slightly more robust implementation would be:

from docx.oxml import OxmlElement
from docx.oxml.ns import qn
from docx.table import _Row

def make_row_cant_split(row: _Row) -> None:
    tr = row._tr

    # -- if the element is already present, make sure it's turned on --
    cantSplits = tr.xpath("./w:trPr/w:cantSplit")
    if cantSplits:
        cantSplit = cantSplits[0]
        cantSplit.set(qn('w:val'), 'true')
        return

    # -- otherwise add it in bool-true state --
    trPr = tr.get_or_add_trPr()
    cantSplit = OxmlElement("w:cantSplit")
    cantSplit.set(qn('w:val'), 'true')
    trPr.insert_element_before(
        cantSplit,
        (
            "w:trHeight",
            "w:tblHeader",
            "w:tblCellSpacing",
            "w:jc",
            "w:hidden",
            "w:ins",
            "w:del",
            "w:trPrChange",
        ),
    )
ShoulddaBeenaWhaleBiologist commented 10 months ago

@scanny Thanks for the robust implementation 👍

Can I ask why it's better to use trPr.insert_element_before() vs the trPr.append() used in other examples above? Is inserting this table row property before all those others you listed more in line with the spec? Just results in more consistently correct behavior or something?

Thanks again

scanny commented 10 months ago

@ShoulddaBeenaWhaleBiologist In general, child elements in the OpenXML schema are specified as a sequence, meaning they have a specified order. Sometimes this order matters, so placing a new element in the right position is something we always do in the library. That's what .insert_element_before() does, a longer name would be "insert this element before any of the following that already appear as a child".

.append() places the new element as the last child. Often this will work and folks do it all the time, but "working" can be client specific, so testing it with LibreOffice might work and with Word not. So I never take the chance and just put it in the right order.