ruby-docx / docx

a ruby library/gem for interacting with .docx files
MIT License
431 stars 170 forks source link

Exception thrown when calling to_html on file with internal hyperlinks #142

Open ycp3 opened 11 months ago

ycp3 commented 11 months ago

Describe the bug

undefined method `value' for nil:NilClass error thrown when calling to_html on a file with internal hyperlinks (hyperlinks to a bookmark or a heading within the file).

Backtrace:

docx (0.8.0) lib/docx/containers/text_run.rb:106:in `hyperlink_id'
docx (0.8.0) lib/docx/containers/text_run.rb:102:in `href'
docx (0.8.0) lib/docx/containers/text_run.rb:81:in `to_html'
docx (0.8.0) lib/docx/containers/paragraph.rb:48:in `block in to_html'
docx (0.8.0) lib/docx/containers/paragraph.rb:47:in `each'
docx (0.8.0) lib/docx/containers/paragraph.rb:47:in `to_html'
docx (0.8.0) lib/docx/document.rb:119:in `map'
docx (0.8.0) lib/docx/document.rb:119:in `to_html' 

According to here the anchor attribute is used instead of the id attribute for internal hyperlinks, breaking line 106 in text_run.rb.

To Reproduce

Open a docx file with a hyperlink to either a heading or a bookmark in the same file and call to_html.

example

require 'docx'

doc = Docx::Document.new('/path/to/your/docx/file_with_internal_hyperlink.docx')

doc.to_html

Sample docx file

https://docs.google.com/document/d/1H01zgmdC2LHAAwXAhmm6RyEz-lwbZm6R/edit?usp=sharing&ouid=103282161859668866778&rtpof=true&sd=true

Expected behavior

No exception thrown; html gets returned as normal.

Environment

mateusg commented 11 months ago

Hi @satoryu. Any idea what could be happening here? I seem to be having a similar problem on any docx version bigger than 0.5.0. 0.5.0 and older versions just sanitize the hyperlinks and print the plain text.

I'm on Ruby 3.1.4, Ubuntu 20.04.

Backtrace:

undefined method `[]' for nil:NilClass
 @document_properties[:hyperlinks][hyperlink_id]
 ^^^^^^^^^^^^^^
docx-0.6.0/lib/docx/containers/text_run.rb:100:in `href'
docx-0.6.0/lib/docx/containers/text_run.rb:79:in `to_html'
docx-0.6.0/lib/docx/containers/paragraph.rb:48:in `block in to_html'
docx-0.6.0/lib/docx/containers/paragraph.rb:47:in `each'
docx-0.6.0/lib/docx/containers/paragraph.rb:47:in `to_html'
satoryu commented 10 months ago

@ycp3 @mateusg Thank you for your reports.

I've just found out the root cause: this gem does not support internal links. I would like to fix this issue but need time.

I seem to be having a similar problem on any docx version bigger than 0.5.0. 0.5.0 and older versions just sanitize the hyperlinks and print the plain text.

Yes, right. Do you think that printing external links as sanitized text makes sense ?