ruby-docx / docx

a ruby library/gem for interacting with .docx files
MIT License
439 stars 171 forks source link

How to get XML from docx file #67

Closed theasteve closed 5 years ago

theasteve commented 5 years ago

I'm trying to convert a docx file into PDF. The process I thought about was as follows, convert the docx file into an HTML file and from HTML into PDF. However, using this process the outcome wasn't what I expected. testing.pdf

This is what it looks like after the process mentioned above. Here is a link to the origin docx file https://www.dropbox.com/s/f1klwguv4r9iyje/testing.docx?dl=0

I think word documents use XML so this might improve how documents are displayed if I saved the file from docx to xml and then into PDF(You might have better direction on this.)

So far I have doc = Docx::Document.open('testing.docx') When I try to get the XML from the document I get nil.

[61] pry(#<PDFProducer>)> doc.xml
=> nil

Can one get XML from the word document? Or am I wrong in my assumption that word documents use XML?

https://stackoverflow.com/questions/56450113/font-size-convert-docx-into-pdf-in-ruby-using-wickedpdf-and-docx

unixmonkey commented 5 years ago
doc = Docx::Document.open('testing.docx')
File.open("testing.html", 'wb') do |f|
  f << doc.to_html
end
theasteve commented 5 years ago

@unixmonkey Just saw your answer, I just updated my post. Should I closed it and open a new one? and bring the old question back to represent your answer? Yes, your answer is correct I came across it earlier.