karnov / htmltoword

Ruby html to word gem
MIT License
177 stars 70 forks source link

Issues with nested lists #40

Closed senny closed 8 years ago

senny commented 8 years ago

I'm looking for a solution to convert HTML to WordprocessingML. I found this gem which looks like an awesome foundation for what I'm building (eventually I need to insert many html fragments into a template).

While running some tests to verify the output I noticed that there are some issues with nested lists. I put my test data in this Gist. The output looks like this:

screen shot 2015-12-16 at 12 19 23

Looks like there are two distinct issues:

  1. Additional indent
  2. Mixed lists (bullets in numbers or numbers in bullets) are not recognized and get displayed like the outer list.

The additional indent seems to be related to whitespace in the input HTML.It happens with HTML that looks like:

<ul>
  <li>lorem ipsum</li>
  <li>
    consectetur adipiscing elit
    <ul>

But doesn't happen with HTML that looks like this:

<ul>
  <li>lorem ipsum</li>
  <li>consectetur adipiscing elit
  <ul>

Are these use-cases that you are looking to support? I'm more than happy to provide additional information if something is missing.

anitsirc commented 8 years ago

Hi,

The current implementation preserves spaces, that's why it looks like that in your example. Might make an update to only preserve them within inline elements and not with all of them. In the mean time perhaps you could use something like xmllint --noblanks or something similar that cleans up the blanks spaces of your document

senny commented 8 years ago

@anitsirc that seems reasonable. What any ideas about the nested lists?

anitsirc commented 8 years ago

@senny Those ones I need to take a look. I thought I had fix that, but seems like not

senny commented 8 years ago

@anitsirc thanks for the fast response!

senny commented 8 years ago

I've started to work on a custom solution to convert HTML to WordML. I'm closing the issue as I am no longer using this gem. Feel free to reopen if you want to keep track of the issue. Thanks for your time :yellow_heart: