Open the-vampiire opened 7 years ago
The inconsistency can be seen in my other file that I used for testing automation. In that file all the font that was replaced remained the correct font size and family.
Using the Paragraph.text
property to replace text is convenient, but a bit of a brute force method. All the text formatting is specified at the run level, and it all gets nuked when you assign to Paragraph.text
because that call removes all the existing runs before adding a single new one containing the assigned text.
It's a pretty hard problem in the fully general case, but what I usually do that works well almost all the time is to remove all the runs in the paragraph except the first one, and set its text to what I want. Generally, the first run is formatted the way you want, in my experience at least.
def set_cell_text_while_retaining_text_formatting(table_cell, text)
# ---replace text of first run with new cell value---
runs = table_cell.text_frame.paragraphs[0].runs
runs[0].text = text
# ---delete all remaining runs---
for run in runs[1:]:
r = run._element
r.getparent().remove(r)
Thank you @scanny I will have to try this function. I am a bit confused - I came across this repo which says it is now merged with python-openxml. If that is the case than are these features available? I dont see anywhere in the documentation how to use them.
https://github.com/mikemaccana/python-docx
Editing documents
Thanks to the awesomeness of the lxml module, we can:
Search and replace
Extract plain text of document
Add and delete items anywhere within the document
Change document properties
Run xpath queries against particular locations in the document - useful for retrieving data from user-completed templates.
That repo is the legacy version of python-docx, version 0.2. It was rewritten from the ground up for various reasons, a big one was to make it object oriented. None of the original code survived and the API is completely different. There are one or two things it tried to do that we haven't implemented yet in this version, search and replace being one of them. However I think you'll find it didn't really work in that earlier version, except for perhaps some very narrow use cases.
I had a similar issue when trying to substitute parts of text found via regex. My template document was font size 10, but the somehow the replaced text got set to font size 11.
To preserve the original size, here is an adapted solution pertaining to paragraphs (not tables as seen above). While iterating document.paragraphs
, remove paragraph.text = text
and call this function in the loop.
def set_text_preserving_text_formatting(paragraph, text):
"""Return None; remove all but the first run object from a paragraph."""
# Replace text of first run paragraph
runs = paragraph.runs
if not runs:
return
runs[0].text = text
# Delete all remaining runs
for run in runs[1:]:
r = run._element
r.getparent().remove(r)
Note, I have not tested this extensively. @scanny Thank you. Your code saved the day.
This function can reliably replace a certain word or phrase with another in a paragraph, retaining the formatting of the original word: https://github.com/python-openxml/python-docx/issues/30#issuecomment-879593691
I am using docx and regex to find and substitute certain keywords from the word doc. The finding and substituting is working fine. When the word is replaced it is set to a much smaller font than before, sometimes. This behavior is inconsistent - on some sections the text is replaced with the correct (matching previous text) font size.
I have not been able to find anything on this topic. Can anyone explain if this is expected behavior and if so how I can correct it?
I have tried inspecting the font size of the text before and after the substitution. According to the logs they are the same size but in the actual saved document they are not.
all fonts = Times New Roman
Original [docx] File:
Saved [docx] File:
Logs from Python:
and the code itself (log points @ line 25 and 27):