karnov / htmltoword

Ruby html to word gem
MIT License
179 stars 71 forks source link

Remove extra whitespace from <a href> tags in DOCX #70

Closed vyruss closed 6 years ago

vyruss commented 6 years ago

This caused extra spaces to appear within the link text in \<a href> tags, resulting in display like this: __________Link_Text________ Fixes DMPRoadmap/roadmap#1185

lukelex commented 6 years ago

Hi @vyruss what's the downside of having these extra spaces? If there're any, would you be so kind to add a test case that proves this?

vyruss commented 6 years ago

Hi @lukelex, for us it resulted in DOCX files which exhibited this formatting for hyperlinks:

image

whereas removing the whitespace results in the following proper display in the exported file:

image

It just added the whitespaces (10 spaces before and a newline and 8 spaces after) as part of the link text. Please let me know if I can better explain this somehow.

lukelex commented 6 years ago

I get your point. Can you please add a test case that covers this to avoid regretions?

vyruss commented 6 years ago

@lukelex OK I believe I found what's happening here: The test case removes all whitespace with the function remove_whitespace from the test html and the generated wordml, which is why it matches - but in the real world scenario this whitespace is not removed and ends up in the wordml and therefore the exported file https://github.com/karnov/htmltoword/blob/900892b5c48ed33ee94b819096f0a5d416034b9d/spec/spec_helper.rb#L19

lukelex commented 6 years ago

You're welcome to have a simpler test case that doesn't remove whitespaces on comparisons 😉

vyruss commented 6 years ago

@lukelex sorry for the delay, I just found the time to add one (see commit above)

vyruss commented 6 years ago

@lukelex is this OK to merge? We'd rather switch our Gemfile to the main repo rather than my fork. Thanks!

lukelex commented 6 years ago

I’ll follow up with an oficial release in the coming days.

vyruss commented 6 years ago

Many thanks.