karnov / htmltoword

Ruby html to word gem
MIT License
179 stars 71 forks source link

Extra Vertical Whitespace #23

Open conspiracyhypo opened 9 years ago

conspiracyhypo commented 9 years ago

The following HTML tags generate unwanted paragraph breaks ("[P]"):

1) "some
text" becomes "some [P][P] text" (2 line breaks, where I only want one) 2) "bold text" becomes "[P]bold text[P]" (2 line breaks where I want zero) 3) "italic text" becomes "[P]italic text[P]" (2 line breaks where I want zero)

Possible Cause: Examining an output docx shows that the tags (
, , ) are treated as distinct paragraphs. i.e. the current paragraph terminates, the gets wrapped in wordml tags, then another paragraph is started for whatever text comes immediately after the closing .

Use case: Including bold or italic text in an AMA-style journal citation.

Environment info: Htmltoword 0.4.2, Rails 4.1.5, Ruby 2.1.5. Issue replicated on both OSX Yosemite and Ubuntu 14.04.

 [update] - Newlines in the input affect the output too.
 Probably a consequence of xslt treating all input as an XML document,
 since that's what XSLT is designed to do.

 e.g. "some text" shows in the output docx as "some text", but 
 "some 
 text"
 becomes "some[CR]text" in the docx.