python-openxml / python-docx

Create and modify Word documents with Python
MIT License
4.63k stars 1.13k forks source link

How to detect merged cells? #1312

Open enrac5 opened 11 months ago

enrac5 commented 11 months ago

I have a doc (a simple snippet is attached to this issue) and I'd like to detect the merged cells after row 1. Right now, I'm just doing a check if the first two cells are the same content, but that seems not ideal (based on the discussion here https://github.com/python-openxml/python-docx/issues/1311). butt_merged.docx

What's a better way of checking for merged cells?

scanny commented 11 months ago

Identifying merged cells is not enough by itself. I think you'll find you need to identify "root" cells and "spanned" cells.

A root cell (my term) is the upper-left cell in a merge. All the other cells in the merge are spanned.

Something like this will only produce root cells. An unmerged cell can be thought of as a root-cell of its own with no spanned cells.

from typing import Iterator
from docx.table import Table, _Cell

def iter_table_cells(table: Table) -> Iterator[_Cell]:
    """Generate each "visible" cell in `table`.

    Note that not all rows will necessarily have the same number of columns and
    a row can start in a column later than the first if there is a vertical merge.
    """
    for row in table.rows:
        tr = row._tr
        for tc in tr.tc_lst:
            # -- vMerge="continue" indicates a spanned cell in a vertical merge --
            if tc.vMerge == "continue":
                continue
            # --  --
            yield _Cell(tc, row)