I was iterating over the tables, and extracting the text in each cells. Most of the time the following code works perfectly:
for object in document.iter_inner_content():
if type(object)==docx.table.Table:
for row in object.rows:
for cell in row.cells:
print(cell.text)
But in some special occasions, especially when there are numbers in the cells' text, the cell.text command failed to extract the entire text. For example the original cell include the text like this:
【24#梁南侧腹板近1#台处,多处箍筋锈蚀、混凝土剥落,0.2×0.5m。】
The cell.text command would return a string like this:
【24#梁南侧腹板近1#台处,多处箍筋锈蚀、混凝土剥落,0.2×。】
As you can see, the "0.5m" was missing in the extracted text. But the period and the right bracket was extracted, so I suppose it is not because of truncation occurred in the string.
I think this is a bug. The file which I tested the bug is attached to this issue.
[Uploading test.docx…]()
I am using python 3.9.19. And this is the versions of the modules I installed in my environment:
I was iterating over the tables, and extracting the text in each cells. Most of the time the following code works perfectly:
But in some special occasions, especially when there are numbers in the cells' text, the
cell.text
command failed to extract the entire text. For example the original cell include the text like this:The
cell.text
command would return a string like this:As you can see, the "0.5m" was missing in the extracted text. But the period and the right bracket was extracted, so I suppose it is not because of truncation occurred in the string.
I think this is a bug. The file which I tested the bug is attached to this issue. [Uploading test.docx…]()
I am using python 3.9.19. And this is the versions of the modules I installed in my environment:
aiobotocore 2.12.3 aiohttp 3.9.3 aioitertools 0.7.1 aiosignal 1.2.0 alabaster 0.7.12 altair 5.0.1 anaconda-anon-usage 0.4.4 anaconda-client 1.12.3 anaconda-navigator 2.3.1 anaconda-project 0.11.1 anyio 3.5.0 appdirs 1.4.4 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 arrow 1.2.3 astroid 2.11.7 astropy 5.3.4 asttokens 2.2.1 async-timeout 4.0.3 atomicwrites 1.4.0 attrs 23.1.0 Automat 20.2.0 autopep8 1.6.0 Babel 2.11.0 backcall 0.2.0 backports.functools-lru-cache 1.6.4 backports.tempfile 1.0 backports.weakref 1.0.post1 bcrypt 3.2.0 beautifulsoup4 4.12.2 binaryornot 0.4.4 bitarray 2.5.1 bkcharts 0.2 black 24.3.0 bleach 4.1.0 blinker 1.6.2 bokeh 3.4.0 boto3 1.24.28 botocore 1.34.69 Bottleneck 1.3.7 Brotli 1.0.9 brotlipy 0.7.0 cachetools 5.3.3 certifi 2024.2.2 cffi 1.16.0 chardet 4.0.0 charset-normalizer 2.0.4 click 8.1.7 cloudpickle 2.2.1 clyent 1.2.2 colorama 0.4.6 colorcet 3.1.0 comm 0.1.2 comtypes 1.1.10 conda 22.9.0 conda-build 3.22.0 conda-content-trust 0.2.0 conda-pack 0.6.0 conda-package-handling 2.2.0 conda_package_streaming 0.9.0 conda-repo-cli 1.0.88 conda-token 0.4.0 conda-verify 3.4.2 constantly 23.10.4 contourpy 1.2.0 cookiecutter 2.6.0 cryptography 42.0.5 cssselect 1.2.0 cycler 0.11.0 Cython 0.29.32 cytoolz 0.12.2 daal4py 2021.6.0 dask 2023.11.0 datashader 0.16.0 datashape 0.5.4 debugpy 1.6.6 decorator 5.1.1 defusedxml 0.7.1 dgl 1.1.0 dglgo 0.0.2 diff-match-patch 20200713 dill 0.3.7 distributed 2023.11.0 docutils 0.18.1 entrypoints 0.4 et-xmlfile 1.1.0 exceptiongroup 1.2.0 executing 1.2.0 fastjsonschema 2.16.2 filelock 3.13.1 flake8 4.0.1 Flask 1.1.2 fonttools 4.51.0 frozenlist 1.4.0 fsspec 2024.3.1 future 0.18.3 gensim 4.3.0 gitdb 4.0.7 GitPython 3.1.37 glob2 0.7 gmpy2 2.1.2 greenlet 3.0.1 h5py 3.9.0 HeapDict 1.0.1 holoviews 1.18.3 hvplot 0.9.2 hyperlink 21.0.0 idna 3.7 imagecodecs 2023.1.23 imageio 2.33.1 imagesize 1.4.1 imbalanced-learn 0.11.0 importlib-metadata 6.1.0 importlib-resources 6.1.1 incremental 22.10.0 inflection 0.5.1 iniconfig 1.1.1 intake 0.6.8 intervaltree 3.1.0 ipykernel 6.22.0 ipython 8.11.0 ipython-genutils 0.2.0 ipywidgets 7.8.1 isort 5.12.0 itemadapter 0.3.0 itemloaders 1.1.0 itsdangerous 2.0.1 jaraco.classes 3.2.1 jdcal 1.4.1 jedi 0.18.2 jellyfish 1.0.1 jieba 0.42.1 Jinja2 2.11.3 jinja2-time 0.2.0 jmespath 1.0.1 joblib 1.4.0 json5 0.9.6 jsonschema 4.19.2 jsonschema-specifications 2023.7.1 jupyter 1.0.0 jupyter_client 8.1.0 jupyter-console 6.6.3 jupyter_core 5.3.0 jupyter-server 1.18.1 jupyterlab 3.4.4 jupyterlab-pygments 0.2.2 jupyterlab-server 2.10.3 jupyterlab-widgets 1.0.0 keyring 24.3.1 kiwisolver 1.4.4 lazy_loader 0.3 lazy-object-proxy 1.10.0 lckr_jupyterlab_variableinspector 3.1.0 libarchive-c 2.9 linkify-it-py 2.0.0 littleutils 0.2.2 llvmlite 0.42.0 lmdb 1.4.1 locket 1.0.0 lxml 4.9.3 lz4 4.3.2 Markdown 3.4.1 markdown-it-py 2.2.0 MarkupSafe 2.0.1 matplotlib 3.8.4 matplotlib-inline 0.1.6 mccabe 0.6.1 mdit-py-plugins 0.3.0 mdurl 0.1.0 menuinst 1.4.19 mistune 0.8.4 mkl-fft 1.3.8 mkl-random 1.2.4 mkl-service 2.4.0 mock 4.0.3 more-itertools 10.1.0 mpmath 1.3.0 msgpack 1.0.3 multidict 6.0.4 multipledispatch 0.6.0 munkres 1.1.4 mypy 1.8.0 mypy-extensions 1.0.0 navigator-updater 0.3.0 nbclassic 1.0.0 nbclient 0.5.13 nbconvert 6.4.4 nbformat 5.9.2 nest-asyncio 1.5.6 networkx 3.1 nltk 3.8.1 nose 1.3.7 notebook 6.4.12 notebook_shim 0.2.3 numba 0.59.1 numexpr 2.8.7 numpy 1.26.4 numpydoc 1.5.0 ogb 1.3.6 olefile 0.46 openpyxl 3.1.2 outdated 0.2.2 packaging 23.0 pandas 1.4.4 pandocfilters 1.5.0 panel 1.4.2 param 2.1.0 paramiko 2.8.1 parsel 1.8.1 parso 0.8.3 partd 1.4.1 pathlib 1.0.1 pathspec 0.10.3 patsy 0.5.3 pep8 1.7.1 pexpect 4.8.0 pickleshare 0.7.5 pillow 10.3.0 pip 23.3.1 pkginfo 1.9.6 platformdirs 3.11.0 plotly 5.19.0 pluggy 1.0.0 ply 3.11 poyo 0.5.0 prettytable 3.9.0 prometheus-client 0.14.1 prompt-toolkit 3.0.38 Protego 0.1.16 protobuf 3.20.3 psutil 5.9.4 ptyprocess 0.7.0 pure-eval 0.2.2 py 1.11.0 py-cpuinfo 9.0.0 pyarrow 14.0.2 pyasn1 0.4.8 pyasn1-modules 0.2.8 pycodestyle 2.8.0 pycosat 0.6.6 pycparser 2.21 pyct 0.5.0 pycurl 7.45.2 pydantic 1.10.7 pydeck 0.8.0 PyDispatcher 2.0.5 pydocstyle 6.3.0 pyecharts 2.0.4 pyerfa 2.0.0 pyflakes 2.4.0 Pygments 2.14.0 PyHamcrest 2.0.2 PyJWT 2.4.0 pylint 2.14.5 pyls-spyder 0.4.0 PyNaCl 1.5.0 pyodbc 5.0.1 pyOpenSSL 24.0.0 pyparsing 3.0.9 PyQt5 5.15.10 PyQt5-sip 12.13.0 pyrsistent 0.18.0 PySocks 1.7.1 pytest 7.4.0 python-dateutil 2.8.2 python-docx 1.1.2 python-lsp-black 1.0.0 python-lsp-jsonrpc 1.1.2 python-lsp-server 1.3.3 python-slugify 5.0.2 python-snappy 0.6.1 pytoolconfig 1.2.6 pytz 2024.1 pyviz-comms 2.0.2 pywavelets 1.5.0 pywin32 306 pywin32-ctypes 0.2.2 pywinpty 2.0.10 PyYAML 6.0.1 pyzmq 25.0.2 QDarkStyle 3.0.2 qstylizer 0.2.2 QtAwesome 1.2.2 qtconsole 5.2.2 QtPy 2.4.1 queuelib 1.6.2 rdkit-pypi 2022.9.5 referencing 0.30.2 regex 2023.10.3 requests 2.31.0 requests-file 1.5.1 requests-toolbelt 1.0.0 rich 13.3.5 rope 1.12.0 rpds-py 0.10.6 Rtree 1.0.1 ruamel.yaml 0.17.26 ruamel.yaml.clib 0.2.7 ruamel-yaml-conda 0.15.100 s3fs 2024.3.1 s3transfer 0.6.0 scikit-image 0.22.0 scikit-learn 1.4.2 scikit-learn-intelex 2021.20221004.171935 scipy 1.13.0 Scrapy 2.11.1 seaborn 0.12.2 Send2Trash 1.8.2 service-identity 18.1.0 setuptools 69.5.1 simplejson 3.19.1 sip 6.7.12 six 1.16.0 smart-open 5.2.1 smmap 4.0.0 sniffio 1.3.0 snowballstemmer 2.2.0 sortedcollections 2.1.0 sortedcontainers 2.4.0 soupsieve 2.5 Sphinx 5.0.2 sphinxcontrib-applehelp 1.0.2 sphinxcontrib-devhelp 1.0.2 sphinxcontrib-htmlhelp 2.0.0 sphinxcontrib-jsmath 1.0.1 sphinxcontrib-qthelp 1.0.3 sphinxcontrib-serializinghtml 1.1.5 spyder 5.2.2 spyder-kernels 2.2.1 SQLAlchemy 2.0.25 stack-data 0.6.2 statsmodels 0.14.0 streamlit 1.32.0 sympy 1.12 tables 3.9.2 tabulate 0.9.0 TBB 0.2 tblib 1.7.0 tenacity 8.2.2 terminado 0.17.1 testpath 0.6.0 text-unidecode 1.3 textdistance 4.2.1 threadpoolctl 2.2.0 three-merge 0.1.1 tifffile 2023.4.12 tinycss 0.4 tinycss2 1.2.1 tldextract 3.2.0 toml 0.10.2 tomli 2.0.1 tomlkit 0.11.1 toolz 0.12.0 torch 2.0.1 torchaudio 2.0.2 torchvision 0.15.2 tornado 6.2 tqdm 4.66.2 traitlets 5.9.0 Twisted 23.10.0 twisted-iocpsupport 1.0.2 typer 0.9.0 typing_extensions 4.9.0 uc-micro-py 1.0.1 ujson 5.4.0 unicodedata2 15.1.0 Unidecode 1.2.0 urllib3 1.26.18 w3lib 2.1.2 watchdog 2.1.6 wcwidth 0.2.6 webencodings 0.5.1 websocket-client 0.58.0 Werkzeug 2.0.3 wheel 0.43.0 widgetsnbextension 3.6.6 win-inet-pton 1.1.0 win-unicode-console 0.5 wincertstore 0.2 wordcloud 1.9.3 wrapt 1.14.1 xarray 2023.6.0 xlrd 2.0.1 XlsxWriter 3.0.3 xlwings 0.29.1 xyzservices 2022.9.0 yapf 0.40.2 yarl 1.9.3 zict 3.0.0 zipp 3.15.0 zope.interface 5.4.0 zstandard 0.22.0