Closed aanastasiou closed 3 months ago
@aanastasiou there was a recent update that addressed the "skipped-cells" condition that is actually a legitimate (although relatively unusual) table state.
If you use Table.rows
to get rows and then iterate _Row.cells
to get each cell you shouldn't have a problem there.
Depending on your needs for column alignment you may want to use _Row.grid_cols_before
and .grid_cols_after
to discover the empty leading and trailing cells.
There is also a new _Cell.grid_span
property so you can tell how many grid-cells a horizontally-merged cell occupies.
I'm not sure what we'll do with Table._cells
. It's possible that collection will be deprecated or perhaps we'll reimplement it based on the new "skipped-cell-aware" code, but for now it is probably better to avoid it in favor of the new methods.
@scanny thank you very much for the prompt response. This was using the latest python-docx
from pypi, would this recent update be applied to the version on github rather than pypi? Thanks for the rest of the information, it's good to know for our next code revision.
This change appears in v1.1.2, which is the current PyPI version, released on May 1, 2024: https://pypi.org/project/python-docx/ https://github.com/python-openxml/python-docx/commit/f4a48b5565a3a09087f541e3ac36a447693927b4
@scanny This is the version that I used (and eventually led me to file this PR)
Show me the client code that isn't working the way you want.
@scanny The PR contains the exact problem that I dealt with (and how), what might take longer is me locating the exact document that causes this behaviour.
@aanastasiou the idea there is not that this problem with table._cells
is fixed for your case, but rather that you should no longer need to use table._cells
and can use something like (c for row in table.rows for c in row.cells)
.
If you can post the code you're using to traverse cells and which gives rise to the error you mention I expect I'll be able to describe how to modify it to avoid any exceptions for uneven row lengths.
I am pre-processing a large number of
.docx
documents with really oddly shaped tables containing text that has to be extracted verbatim.As useful
python-docx
has been in this task, a subset of those documents revealed a tiny little bug in this line.This PR fixes cases of odd table shapes were the strategy of populating a cell with the value of the previous cell (e.g. in the case of row/cell merges) fails, because there simply has not been a 'previous cell' yet.
Please note, I would be glad to contribute a test case as well but this might take a bit more time, tracking down the exact table (within the XML) that causes the bug and creating an "equivalent" test case.
Hope this helps.