I was iterating over the tables, and extracting the text in each cells. Most of the time the following code works perfectly:
for object in document.iter_inner_content():
if type(object)==docx.table.Table:
for row in object.rows:
for cell in row.cells:
But in some special occasions, especially when there are numbers in the cells' text, the cell.text command failed to extract the entire text. For example the original cell include the text like this:
The cell.text command would return a string like this:
As you can see, the "0.5m" was missing in the extracted text. But the period and the right bracket was extracted, so I suppose it is not because of truncation occurred in the string.
I think this is a bug. The file which I tested the bug is attached to this issue.
[Uploading test.docx…]()
I am using python 3.9.19. And this is the versions of the modules I installed in my environment:
