python-openxml / python-docx

Create and modify Word documents with Python
MIT License
4.61k stars 1.13k forks source link

feature: Paragraph.add_hyperlink() #74

Open scanny opened 10 years ago

scanny commented 10 years ago

Protocol might be something like this:

>>> hyperlink = paragraph.add_hyperlink(text='foobar', url='http://github.com')
>>> hyperlink
<docx.text.Hyperlink instance at 0xdeadbeef1>
>>> hyperlink.url
'http://github.com'
>>> hyperlink.text
'foobar'

XML specimen:

<w:p>
  <w:r>
    <w:t xml:space="preserve">This sentence has a </w:t>
  </w:r>
  <w:hyperlink r:id="rId5" w:history="1">
    <w:r>
      <w:rPr>
        <w:rStyle w:val="Hyperlink"/>
      </w:rPr>
      <w:t>hyperlink</w:t>
    </w:r>
  </w:hyperlink>
  <w:r>
    <w:t xml:space="preserve"> in it.</w:t>
  </w:r>
</w:p>
robertdodd commented 10 years ago

I'm trying to get this working maybe you can help:

https://github.com/robertdodd/python-docx/commit/a616e8134fc5731d4f0e1806fdae64e2b9ee6989#diff-d41d8cd98f00b204e9800998ecf8427e

XML works just like above, but I'm not sure how to add a URL to the docs References and get the refId.

Could someone point me in the right direction?

scanny commented 10 years ago

This method in the test code might be a help: https://github.com/python-openxml/python-docx/blob/master/tests/opc/test_package.py#L328

The critical call will be something like:

rId = part.relate_to(url, reltype, is_external=True)

Then the value of rId will replace your 'rId5' above.

For the main document, you can get a reference to the document part using: document._document_part.

reltype can vary, but a regular hyperlink uses docx.opc.contants.RELATIONSHIP_TYPE.HYPERLINK. I usually use this to get those references:

from docx.opc.constants import RELATIONSHIP_TYPE as RT

foo = RT.HYPERLINK

You can check an example document using opc-diag to see what relationship type URI it uses in your particular case if you think it might be different.

robertdodd commented 10 years ago

Thank you for the fast reply -- I'll try it out and get back to you soon...

robertdodd commented 10 years ago

I got it working, thanks so much! Could you please give me some quick feedback?

https://github.com/robertdodd/python-docx/commit/1aa3860fcbbe8675b23b06ec9144136dc8b88e24

I passed the document to paragraph.add_hyperlink so the relationship could be created. Is there a way to use the document without passing it around?

hyperlink = p.add_hyperlink(document, text='Google', url='http://google.com')

I also want to update the URL manually -- but you need a reference to the document to do that and I'm not sure where to put it.

hyperlink.text = 'Google'
hyperlink.url = 'http://google.com'
taliastocks commented 10 years ago

@robertdodd Any progress on this feature? I see you went back and forth and got some feedback from @scanny.

robertdodd commented 10 years ago

Hey @collinstocks -- If you want to use this now I got it working roughly over here. There are also internal hyperlinks over here thanks to Anton.

I got great feedback from @scanny and Anton but I've been a bit caught up recently and haven't finished implementing it yet. I should get some time soon -- and hopefully we'll see it merged!

AKimZ commented 9 years ago

@robertdodd: Same question as @collinstocks and a note. I'm using your implementation and it works quite well. However, the hyperlinks are generated without any styling associated. (Looks like plain text) Maybe address that before releasing this feature?

tanyunshi commented 9 years ago

@robertdodd, @AKimZ: I continue @robertdodd's work and adding styles and multiple runs for the same link are possible now. See https://github.com/tanyunshi/python-docx/commit/6b9d40b8edf5959f7891f019a248514c691ae07e

johnzupancic commented 9 years ago

@tanyunshi Thank you! This is awesome. I wonder if you'll be able to merge what you have with the most recent version of python_docx (since we can adjust the color of text to be blue).

tanyunshi commented 9 years ago

@johnzupancic, hihi, merge done here https://github.com/tanyunshi/python-docx/commit/90237e81a61810c7272c73fbef8edeb6b8be63bd

gordeychuk commented 9 years ago

Hi guys,

Looks like adding hyperlinks now possible, but what about reading them from paragraph? I have a problem that after reading text from paragraph I missed hyperlinks. Can I do this using changes which done in this thread?

MuhammetDilmac commented 9 years ago

Hi @scanny When this feature implementation on main repo?

htahir1 commented 9 years ago

Is adding a hyperlink now supported?

BlackArbsCEO commented 9 years ago

I'm curious if this will be implemented in the main repo as well. Otherwise great work on the project and the documentation is actually really useful.

Substr commented 8 years ago

@scanny Any chance of this being merged into the main repo?

tanyunshi commented 8 years ago

Hi @Courthold , I dont think this will be merged into the main repo as it lacks tests and the API has not been vetted. Here comes the dicussion https://github.com/python-openxml/python-docx/pull/162.

I think there were some problemes in the implementation(see also @gordeychuk).

ryan-rushton commented 8 years ago

For anyone needing a workaround you can use this function. Note that it only let you write a hyperlink, you won't be able to modify the link without going back down to the lxml level.

def add_hyperlink(paragraph, url, text):
    """
    A function that places a hyperlink within a paragraph object.

    :param paragraph: The paragraph we are adding the hyperlink to.
    :param url: A string containing the required url
    :param text: The text displayed for the url
    :return: A Run object containing the hyperlink
    """

    # This gets access to the document.xml.rels file and gets a new relation id value
    part = paragraph.part
    r_id = part.relate_to(url, RT.HYPERLINK, is_external=True)

    # Create the w:hyperlink tag and add needed values
    hyperlink = OxmlElement('w:hyperlink')
    hyperlink.set(qn('r:id'), r_id, )
    hyperlink.set(qn('w:history'), '1')

    # Create a w:r element
    new_run = OxmlElement('w:r')

    # Create a new w:rPr element
    rPr = OxmlElement('w:rPr')

    # Create a w:rStyle element, note this currently does not add the hyperlink style as its not in
    # the default template, I have left it here in case someone uses one that has the style in it
    rStyle = OxmlElement('w:rStyle')
    rStyle.set(qn('w:val'), 'Hyperlink')

    # Join all the xml elements together add add the required text to the w:r element
    rPr.append(rStyle)
    new_run.append(rPr)
    new_run.text = text
    hyperlink.append(new_run)

    # Create a new Run object and add the hyperlink into it
    r = paragraph.add_run()
    r._r.append(hyperlink)

    # A workaround for the lack of a hyperlink style (doesn't go purple after using the link)
    # Delete this if using a template that has the hyperlink style in it
    r.font.color.theme_color = MSO_THEME_COLOR_INDEX.HYPERLINK
    r.font.underline = True

    return r
Adviser-ua commented 8 years ago

Great job! How I can make hyperlink inside file to other paragraph ?

ryan-rushton commented 8 years ago

How I can make hyperlink inside file to other paragraph ?

It would be best to unzip a word document and figure out whats needed. Personally, to figure the above out I made documents with only the required feature in it, unzipped them and determined the code that differed. What made it easier was putting things in a table so you get logical containers for certain parts of code.

I would assume that you would use the above code and with the exception that the line

r_id = part.relate_to(url, RT.HYPERLINK, is_external=True)

would change to something like

r_id = part.relate_to(internal_tag, RT.HYPERLINK, is_external=False)

Then you would need to make an internal_tag for some other part of the document.

johanvandegriff commented 8 years ago

The workaround didn't work for me. I had to modify it to insert the hyperlink directly into the paragraph:

def add_hyperlink(paragraph, url, text):
    """
    A function that places a hyperlink within a paragraph object.

    :param paragraph: The paragraph we are adding the hyperlink to.
    :param url: A string containing the required url
    :param text: The text displayed for the url
    :return: The hyperlink object
    """

    # This gets access to the document.xml.rels file and gets a new relation id value
    part = paragraph.part
    r_id = part.relate_to(url, docx.opc.constants.RELATIONSHIP_TYPE.HYPERLINK, is_external=True)

    # Create the w:hyperlink tag and add needed values
    hyperlink = docx.oxml.shared.OxmlElement('w:hyperlink')
    hyperlink.set(docx.oxml.shared.qn('r:id'), r_id, )

    # Create a w:r element
    new_run = docx.oxml.shared.OxmlElement('w:r')

    # Create a new w:rPr element
    rPr = docx.oxml.shared.OxmlElement('w:rPr')

    # Join all the xml elements together add add the required text to the w:r element
    new_run.append(rPr)
    new_run.text = text
    hyperlink.append(new_run)

    paragraph._p.append(hyperlink)

    return hyperlink

document = docx.Document()
p = document.add_paragraph()
add_hyperlink(p, 'http://www.google.com', 'Google')
document.save('demo.docx')
krnlyng commented 8 years ago

@rushton3179 can you elaborate more? eg. how can we create the internal_tag?

posterberg commented 7 years ago

@johanvandegriff Your solution works fine for me. I just haven't mastered the skills needed to change color, font etc on the returned hyperlink. Can I get the function to return a 'run' instead so I can use run.style or run.underline?

johanvandegriff commented 7 years ago

@posterberg I don't know how to make a workaround that returns a run, but I have improved the current one to take the color and underline as arguments.

Here are the steps I took to change the text color, in case you need to add other properties:

    # Add color if it is given
    if not color is None:
      c = docx.oxml.shared.OxmlElement('w:color')
      c.set(docx.oxml.shared.qn('w:val'), color)
      rPr.append(c)

Here is the updated workaround with control of color and underlining:

import docx

def add_hyperlink(paragraph, url, text, color, underline):
    """
    A function that places a hyperlink within a paragraph object.

    :param paragraph: The paragraph we are adding the hyperlink to.
    :param url: A string containing the required url
    :param text: The text displayed for the url
    :return: The hyperlink object
    """

    # This gets access to the document.xml.rels file and gets a new relation id value
    part = paragraph.part
    r_id = part.relate_to(url, docx.opc.constants.RELATIONSHIP_TYPE.HYPERLINK, is_external=True)

    # Create the w:hyperlink tag and add needed values
    hyperlink = docx.oxml.shared.OxmlElement('w:hyperlink')
    hyperlink.set(docx.oxml.shared.qn('r:id'), r_id, )

    # Create a w:r element
    new_run = docx.oxml.shared.OxmlElement('w:r')

    # Create a new w:rPr element
    rPr = docx.oxml.shared.OxmlElement('w:rPr')

    # Add color if it is given
    if not color is None:
      c = docx.oxml.shared.OxmlElement('w:color')
      c.set(docx.oxml.shared.qn('w:val'), color)
      rPr.append(c)

    # Remove underlining if it is requested
    if not underline:
      u = docx.oxml.shared.OxmlElement('w:u')
      u.set(docx.oxml.shared.qn('w:val'), 'none')
      rPr.append(u)

    # Join all the xml elements together add add the required text to the w:r element
    new_run.append(rPr)
    new_run.text = text
    hyperlink.append(new_run)

    paragraph._p.append(hyperlink)

    return hyperlink

document = docx.Document()
p = document.add_paragraph()

#add a hyperlink with the normal formatting (blue underline)
hyperlink = add_hyperlink(p, 'http://www.google.com', 'Google', None, True)

#add a hyperlink with a custom color and no underline
hyperlink = add_hyperlink(p, 'http://www.google.com', 'Google', 'FF8822', False)

document.save('demo.docx')

This function is the hyperlink equivalent of duct tape: It get the job done, but becomes harder to use when the complexity of the task increases.

scanny commented 7 years ago

Nice job @johanvandegriff :)

Just a note for anyone who doesn't know about it, opc-diag can be very handy for poking around inside .docx packages as an alternative to unzipping and reformatting the XML yourself. Also works for .xlsx and .pptx files.

posterberg commented 7 years ago

@johanvandegriff Thank you so much!

Sanjeetkumar163 commented 7 years ago

How can I make the "inline_shape" as the hyperlink? Basically, I want an image as a hyperlink.

ryan-rushton commented 7 years ago

@scanny is anyone working on this? I was potentially going to pick it up this weekend and have a look at implementing it.

scanny commented 7 years ago

There was this pull request a while back but it stalled pretty early on: https://github.com/python-openxml/python-docx/pull/278

The comments on that PR should be good guidance. Best to start with the enhancement proposal (analysis document) so we can be sure we have the API sorted out. Can't change our mind about that later so it's best to get it right up-front. And implementing the wrong API isn't terrifically productive :)

scanny commented 7 years ago

I should add that most folks get stuck on the tests. If you're already a TDD guy these shouldn't be too surprising, but in any case you can usually find and adapt an existing example for both the acceptance tests and the unit tests. There's a lot of "repeating theme" going on in this particular application domain :)

dobbyth33lf commented 7 years ago

If I create hyperlinks using the above workaround, what would the code to extract the URLs look like?

dobbyth33lf commented 7 years ago

And indeed, having created a link like this

hyperlink = add_hyperlink(p, 'http://www.google.com', 'Google', '0000FF', True)
print p.text

You can no longer access the paragraph text. Is there a workaround?

scanny commented 7 years ago

see #85

ryan-rushton commented 7 years ago

@scanny would you be able to have a look over testing so far so I know I am on the right track? https://github.com/rushton3179/python-docx/tree/feature/hyperlink-tdd

Also I have been thinking and do we need methods to be able to collect hyperlinks from a paragraph? My instinct would be to leave it as a more simple class as it begins to bring doubt on how collecting runs from a paragraph may work. I feel it may be better to leave them and if the original Hyperlink objects are not held onto by the project then they are considered lost to be created again. The core functionality of the Paragraph class seems to be to create paragraphs and not to parse and process them.

scanny commented 7 years ago

@rushton3179 It's probably best to continue this as a pull request (PR). That way there's a segregated "space" where all the proposed changes are, along with any review conversation we might have. As you rebase and re-push the PR branch on your repo, it updates things in the PR. If you haven't done it before you might want to read up.

Anyway, it's pretty flexible, so it's not too early to get one going.

I've left you some comments on your branch, but let's continue from here in a PR :)

dsynkov commented 7 years ago

@johanvandegriff I'm able to get the color (using '0000EE' as the default blue hyperlink color) using your workaround but not the underlining. Interestingly enough, when I open my document in WordPad I get the color and and the underlining, but in Word 2016 I only get the former. Have you come across this at all? (I'm currently using a Word macro as an alternative.)

johanvandegriff commented 7 years ago

@dmitriy5 I have been using LibreOffice, so I don't know if it works in Word. You might want to add the underlining in Word, save it, and see how the xml has changed.

Abolfazl commented 6 years ago

Underline was not working for me in word either using @johanvandegriff code. To have it underline by default, you need to add:

u = docx.oxml.shared.OxmlElement('w:u')
u.set(docx.oxml.shared.qn('w:val'), 'single')
rPr.append(u)

before you run new_run.append(rPr). You can also set 'single' to 'double' to double underline.

Naff16 commented 6 years ago

Hello, the @rushton3179 solution is working for a new paragraph like a charm. But it's possible to use this to insert an hyperlink into a table in a docx document??

Something like this row.cells[1].paragraphs[0].text = add_hyperlink(...), because im getting a lot of errors when i do this is in a table.

Best regards

Audry21 commented 6 years ago

Hello, I had a look all over Google to find a way to add hyperlinks to my .docx files using python-docx. The only working solution i found was the code sample in this topic, bus as said @Naff16, it does not work in tables:

hdr_cells = table.rows[1].cells
p = hdr_cells[1].paragraphs[0]
add_hyperlink(p, ....)

the resulting .docx file is corrupted and can be restaured, but without the hyperlink... Any idea/help ? My need is to add hyperlinks to others files on the PC (the .docx would be too heavy if i add all pictures directly, so I prefer storethem in an other folder, and just add hyperlink to them) I know it works to add hyperlinks in the document, but I absolutely need to insert those hyperlinks in a table... Thanks

Audry21 commented 6 years ago

Hello ! After several tests, it appears that the "does not work in tables" is, in my case, because the table I tried to insert the hyperlink in was copied from another .docx file using deepcopy(), and somehow, this is a problem. So the solution I found is: 1) insert the table in the document

#cartouche is a table deepcopied form an other .docx
p = self.document.add_paragraph()
p._p.addnext(cartouche._tbl)

2)then, insert the link in the table, which is the last table inserted in document


 for f in element.files:
            p_table = self.document.tables[-1].rows[2].cells[1].add_paragraph()
            file_name = f.split('/')[-1]
            file_path = 'EVIDENCES/{}'.format(file_name)  
            # ajout du lien
            add_hyperlink(p_table,file_name, file_path)

this worked just fine for me

neilbilly commented 5 years ago

Following on from @Adviser-ua comment:

How I can make hyperlink inside file to other paragraph ?

For anyone in this situation, i.e. wanting to link to an internal bookmark, this function, based on a stripped down version of @johanvandegriff code above worked for me (in Word 2010):

def add_hyperlink(paragraph, link_to, text, is_external):
    ''' Adds a hyperlink within a paragraph to an internal bookmark 
    or an external url '''

    part = paragraph.part

    hyperlink = docx.oxml.shared.OxmlElement('w:hyperlink')
    if is_external:
        r_id = part.relate_to(link_to, 
            docx.opc.constants.RELATIONSHIP_TYPE.HYPERLINK, 
            is_external= is_external)

        hyperlink.set(docx.oxml.shared.qn('r:id'), r_id, )
    else:
        hyperlink.set(docx.oxml.shared.qn('w:anchor'), link_to, )

    new_run = docx.oxml.shared.OxmlElement('w:r')
    rPr = docx.oxml.shared.OxmlElement('w:rPr')

    new_run.append(rPr)
    new_run.text = text
    hyperlink.append(new_run)

    paragraph._p.append(hyperlink)

Set is_external to False and pass a bookmark to link_to.

If you need to make a bookmark:

def add_bookmark(run, bookmark_name):
    ''' Adds a word bookmark to a run '''
    tag = run._r
    start = docx.oxml.shared.OxmlElement('w:bookmarkStart')
    start.set(docx.oxml.ns.qn('w:id'), '0')
    start.set(docx.oxml.ns.qn('w:name'), bookmark_name)
    tag.append(start)

    text = docx.oxml.OxmlElement('w:r')
    tag.append(text)

    end = docx.oxml.shared.OxmlElement('w:bookmarkEnd')
    end.set(docx.oxml.ns.qn('w:id'), '0')
    end.set(docx.oxml.ns.qn('w:name'), bookmark_name)
    tag.append(end)

    return run

One thing to note is that if the bookmark contains a space it causes a problem if the .docx is exported to PDF, i.e. it won't link in the exported PDF.

michaelu123 commented 5 years ago

It bothered me that the text is not written into a normal run, but into an Element, so that font size and color are not preserved. I finally came up with this solution, that just adds a hyperlink to a normal run. The run parameter must of course be one of the runs of the paragraph. I confess that I have only a vague idea how lxml and docx work together. In the moment when hyperlink.append(run._r) is called, the run disappears from the runs, but the hyperlink is then inserted into runs where the run originally was.

def add_hyperlink_into_run(paragraph, run, url):
    runs = paragraph.runs
    for i in range(len(runs)):
        if runs[i].text == run.text:
            break

    # This gets access to the document.xml.rels file and gets a new relation id value
    part = paragraph.part
    r_id = part.relate_to(url, docx.opc.constants.RELATIONSHIP_TYPE.HYPERLINK, is_external=True)

    # Create the w:hyperlink tag and add needed values
    hyperlink = docx.oxml.shared.OxmlElement('w:hyperlink')
    hyperlink.set(docx.oxml.shared.qn('r:id'), r_id, )
    hyperlink.append(run._r)
    paragraph._p.insert(i+1,hyperlink)
wandebandera commented 5 years ago

Hi! How to add a hyperlink to an internal heading paragraph?

fdeh75 commented 5 years ago

It's work for me (Libre), with a few changes. Thanks @neilbilly !

def add_bookmark(run, bookmark_name):
    ''' Adds a word bookmark to a run '''
    tag = run._r
    start = docx.oxml.shared.OxmlElement('w:bookmarkStart')
    start.set(docx.oxml.ns.qn('w:id'), '0')
    start.set(docx.oxml.ns.qn('w:name'), bookmark_name)
    tag.addprevious(start)

    text = docx.oxml.OxmlElement('w:r')
    tag.append(text)

    end = docx.oxml.shared.OxmlElement('w:bookmarkEnd')
    end.set(docx.oxml.ns.qn('w:id'), '0')
    tag.addnext(end)

    return run
zwupup commented 5 years ago

As for a common case, I have a text like """I am trying to add an hyperlink in a MS Word document using docx module for \<a href="python.org">Python\</a>. Just do it.""", and keyword for "Python", link for "python.org". Just add a function based on @johanvandegriff ,

def is_text_link(text):
    for i in ['http', '://', 'www.', '.com', '.org', '.cn', '.xyz', '.htm']:
        if i in text:
            return True
        else:
            return False

def add_text_link(document, text):
    paragraph = document.add_paragraph()
    text = re.split(r'<a href="|">|</a>',text)
    keyword = None
    for i in range(len(text)):
        if not is_text_link(text[i]):
            if text[i] != keyword:
                paragraph.add_run(text[i])
        elif i + 1<len(text):
            url=text[i]
            keyword=text[i + 1]
            add_hyperlink(paragraph, url, keyword, None, True)

document.save('test.docx')
jkstill commented 5 years ago
        p_table = self.document.tables[-1].rows[2].cells[1].add_paragraph()

Thank you for this!

This is the bit I needed to properly reference the paragraph so I could insert a hyperlink in a cell. All working now.

chescales commented 5 years ago

We also got it running with @johanvandegriff 's solution, thanks! Once the feature is shipped then we'll move to the official solution :) thanks guys

abubelinha commented 3 years ago

I thank you all for the works trying to improve this wonderful project. Sorry I have not a real understanding of how all these implementations work, but please take this comment into account before merging code into an official solution.

I tried many of these code samples trying to add links to a document, using both examples given here and in StackOverflow.

Although many of them worked, in the sense that hyperlinks do appear when I open the docx file in Word, ... there is still something which must be different to the standard .docx way of hyperlinking.

I say this because when I upload these docx files to Google Drive in order to share them ... the hyperlinks get lost after conversion to Google Docs format (which I do because this format does not consume my Drive quota). This does NOT happen to hyperlinks in an "standard" .docx file created with MS Word (they still remain when you convert them to Google Doc format).

This might seem irrelevant to many of you, but I think it reveals some error in the way hyperlinks are being created. It could affect to future conversions/compatibility of your files (I just tried Google Docs but there might be other conversions which are already failing).

Fortunately, I found one implementation in this thread (thanks @michaelu123) where hyperlinks are not being lost. There is a similar implementation by @brasky in #610 too.

So @johanvandegriff @scanny @tanyunshi @robertdodd @ryan-rushton ... please take a look at @michaelu123 code before making a final version.

Thanks a lot again to all of you!!

caelohm commented 3 years ago

I'm trying to add a hyperlink to my table inside of one of the cells, but when I use this method it messes up the spacing of the column. The hyperlink isn't wrapped around in the cell like I want it to be.

Edit: Oh wait nevermind, my table had "automatically resize to fit contents". I wasn't having this issue until I added the hyperlink weirdly enough.. to fix it you add table.autofit = False

GP720 commented 3 years ago

how to add file logo in the place of hyperlink