python-openxml / python-docx

Create and modify Word documents with Python
MIT License
4.52k stars 1.11k forks source link

when hiding a full paragraph, paragraph mark (line break) remains visible ... Bug? #955

Closed abubelinha closed 1 year ago

abubelinha commented 3 years ago

Let's say I have three paragraphs in my document:

Visible paragraph.
I want to hide this paragraph.
Visible paragraph.

and I want to hide the middle one, to get a result like this:

Visible paragraph.
Visible paragraph.

With MS-Word, I achieve this by simply selecting the whole middle paragraph (including end of paragraph mark), then going to font settings and selecting "hidden". This hides the full paragraph text, including the paragraph mark .

With python-docx I was able to do it, but only the text (not the end of paragraph mark, so there is an unwanted line break between the two remaining visible paragraphs). This is how my result looks like (and I also attached the resulting .docx file I am getting):

Visible paragraph.

Visible paragraph.

This is my code:

def hiddenParagraphTest():
    from docx import Document
    d = Document()
    p = d.add_paragraph("Visible paragraph.")
    p = d.add_paragraph()
    r = p.add_run()
    r.text = "I want to hide this paragraph."
    r.font.hidden = True
    p = d.add_paragraph("Visible paragraph.")
    d.save("hiddenparagraphtest.docx")

hiddenParagraphTest()

I have the impression that this happens because the paragraph mark is not part of any runs. So how should achieve this using python-docx? Is there a .hidden attribute for the paragraphs somewhere? I couldn't find it in the documentation.

Thanks a lot in advance for your help

EDIT: I opened an issue because I felt this was a bug or lack of documentation, but just in case I asked in StackOverflow as well.

My sample code intends to generate a new document using python-docx ... but I think the same question applies if I try to open a pre-existing .docx file in order to completely hide a certain paragraph.

hiddenparagraphtest.docx

abubelinha commented 3 years ago

Perhaps nobody answers because I forgot to mention my operating system? Windows 7, Python 3.8

It looks a simple issue to me (compared to others which are getting answers). Please @scanny , could you give me a hint on this? I edited the question to add a sample copy of the resulting .docx file. Thanks

scanny commented 3 years ago

The "paragraph mark" of a paragraph is not part of any run. So the behavior you're seeing is what I would expect.

Compare the XML generated by Word when you hide the paragraph "by hand" using the WordUI to the XML generated by python-pptx. I expect you'll find something like w:p/w:pPr/w:vanish{val=1} in the XML generated by Word.

If you can confirm that's what's giving the behavior you're looking for then all you have to do is figure out how to set that value.

abubelinha commented 3 years ago

Thanks a lot for the detailed explanations, @scanny
If I understood correctly, python-docx library alone is not enough for what I try to do, so my only option is to re-open the XML generated by python-docx and modifying it using other Python methods? Finding and resetting values in that XML contents sounds pretty far beyond my programming skills. I have always got lost the few times I tried to dive into zipped .docx files.

Perhaps you can advice me in other way? What I am trying to do is generating hidden marks after certain paragraphs of my document, like this (all {{}} items would be hidden):

Paragraph with many lines. {{mark_5021}} Paragraph with many lines. {{mark_0287}} Paragraph with many lines. {{mark_7034}} ... and so on.

Basically, these paragraphs describe items in a museum collection. And the marks are items identificators. For some items, I have one to many images in a local folder, named "item_0311_img1.jpg", "item_0311_img2.jpg", "item_0834_img1.jpg", ... and so on.

So my plan was to check this folder contents, parse all filenames to locate item ids inside, and then parse my .docx to find the places where those images should be inserted.

Only when the corresponding mark is located in the document, the image should be inserted and that paragraph should become "visible".

But for all the remaining marks, the paragraph should stay "invisible" (in the sense that no additional paragraph mark should exist there, because there are no images in that run).

Can you suggest another way to achive this using python-docx only approach? Perhaps I could somehow delete those paragraphs instead of "trying to hide" them?

abubelinha commented 1 year ago

There is a solution in StackOverflow: how-to-hide-a-full-paragraph-in-word-docx-file-using-python-docx

I provide here the code posted by skyway (aka @dothinking). Thanks!:

You are almost there. Hide paragraph as you did in MS-Word and check the source xml, it looks like:

<w:p>
    <w:pPr>
        <w:rPr>
            <w:vanish/>
        </w:rPr>
    </w:pPr>
    <w:r>
        <w:rPr>
            <w:vanish/>
        </w:rPr>
        <w:t>Hidden</w:t>
    </w:r>
</w:p>

Compared to your code generated docx, you should find that the additional w:pPr node, i.e., paragraph property, does the trick.

So, let's add this property manually.

from docx.oxml import OxmlElement

def setHiddenProperty(p):
    pPr = OxmlElement('w:pPr') # paragraph property
    rPr = OxmlElement('w:rPr') # run property
    v = OxmlElement('w:vanish') # hidden
    rPr.append(v)
    pPr.append(rPr)
    p._p.append(pPr)

Finally, apply it to your code.

def hiddenParagraphTest():
    from docx import Document
    d = Document()
    p = d.add_paragraph("Visible")

    p = d.add_paragraph()
    setHiddenProperty(p)  # set paragraph hidden property
    r = p.add_run()
    r.text = "Hidden"
    r.font.hidden = True

    p = d.add_paragraph("Visible")
    d.save("hiddenparagraphtest.docx")

hiddenParagraphTest()

@abubelinha