python-openxml / python-docx

Create and modify Word documents with Python
MIT License
4.6k stars 1.13k forks source link

Hyperlinks #610

Open avvRobertoAlma opened 5 years ago

avvRobertoAlma commented 5 years ago

Hello! I would like to know if it is possible to add an hyperlink to an existing portion of text. I have read the helper functions provided in past issues, but I did not find something helpful for me.

That's the question: I have some text, for example, My name is Roberto and I am a lawyer (doc. 1) I would like to add an hyperlink to the string "doc. 1" (in the same way i may set an hyperlink in a word processor, just underlining text).

I have read the docs and it seems that hyperlink may be managed at Runs level. The problem is that i have not understood if I can set a Run inside an existing paragraph (without adding a new one).

# coding=utf-8

import docx
import hyperlink
import helpers

doc = docx.Document('atto.docx')

paragraphs = doc.paragraphs

txt = 'doc. 1'

for p in paragraphs:
    runs = p.runs

    for run in runs:
        print run.text
        if txt in run.text:
            hyperlink.add(p, run, 'https://github.com')
            run.font.color.rgb = docx.shared.RGBColor(0, 0, 255)

doc.save('new-atto.docx')

I tried this code (using the hyperlink function created by someone here, which adds the link to an existing run) but it does not work very well. I tried to set in the word processor "doc. 1" in bold (in order to assume that "doc.1" will be considered as a distinct run) but the implementation is quite buggy.

Could you please give me some advice ? (I am a lawyer and still inexperienced as developer :-) )

@EDIT: if i change some styling in the string "doc. 1" in the original document, the above-mentioned will become a separate Run and the above code may work.

brasky commented 5 years ago

This might not be the best solution, but if you loop through paragraphs/runs and find "(doc. #)" using a regular expression then you could take that string, make it a new run and insert it into the existing runs, then pass it to hyperlink.add.

Also good on you for being a lawyer and developer, putting me to shame.

avvRobertoAlma commented 5 years ago

Thank you @brasky. So there is the possibility to make a new run given a specific text ? For example, given a paragraph "Hello My name is Roberto" and given that "Roberto" is found through a regular expression, how can i transform Roberto into a run?

Also good on you for being a lawyer and developer, putting me to shame.

Hehe, I am a particular lawyer :-)

brasky commented 5 years ago

Here's some code I threw together which will find some pattern in a run and replace it with a new run with the pattern text and then add the hyperlink to it. As far as I know there's no way through the docx api to add a run after or before a given run, but through this post by @scanny I found the solution through lxml.etree._Element.

By the way, in the future I think the recommended forum for help is Stack Overflow using the "python-docx" tag.

import re
import docx
from docx.text.run import Run

# --- this document contains one paragraph and one run with the following text:
# --- "Please refer to document 1 (doc. 1). More testing sentences. "
doc = docx.Document('testing runs.docx')

# --- This is the pattern to match the string '(doc. #)'
pattern = "\(doc. [0-9]\)."

def add_hyperlink_into_run(paragraph, run, url):
    runs = paragraph.runs
    for i in range(len(runs)):
        if runs[i].text == run.text:
            break
    # --- This gets access to the document.xml.rels file and gets a new relation id value ---
    part = paragraph.part
    r_id = part.relate_to(
        url, docx.opc.constants.RELATIONSHIP_TYPE.HYPERLINK, is_external=True
    )
    # --- Create the w:hyperlink tag and add needed values ---
    hyperlink = docx.oxml.shared.OxmlElement('w:hyperlink')
    hyperlink.set(docx.oxml.shared.qn('r:id'), r_id, )
    hyperlink.append(run._r)
    paragraph._p.insert(i,hyperlink)
    run.font.color.rgb = docx.shared.RGBColor(0, 0, 255)

for paragraph in doc.paragraphs:
    for run in paragraph.runs:
        matches = re.findall(pattern, run.text)
        if matches:
            print("Found a match!")
            # --- this replaces the pattern we wrote above with nothing.
            run.text = re.sub(pattern, "", run.text) 
            # --- if there are more than one instances of '(doc. #)' in the run,
            # --- we want to replace all of them. This logic might not work perfectly
            # --- but it's a start.
            for match in matches: 
                new_run_element = paragraph._element._new_r()
                run._element.addnext(new_run_element)
                new_run = Run(new_run_element, run._parent)
                new_run.text = match + " "
                add_hyperlink_into_run(paragraph, new_run, "http://google.com")

doc.save('testing-runs-complete.docx')

Edit: There's some goofiness with some periods being in different runs, so you might have to play with the pattern and how you edit the original text of the run. Sorry this isn't more comprehensive, something I just threw together quickly. If you need more help I'd recommend making a stack overflow post and posting a link to it here.

cparmet commented 4 years ago

@brasky The add_hyperlink_into_run function was extremely helpful to me for another use case. Thanks very much!

brasky commented 4 years ago

Thank @scanny - I think he did some editing magic on it to improve it 😄

scanny commented 4 years ago

Just added the "```python" hint at the top for syntax highlighting and shortened comments to fit without horizontal scrolling. Nice job @brasky :)

cparmet commented 4 years ago

Well @scanny either way thanks to you (and your co-contributors) for this phenomenal package. It’s super cool and much appreciated.