Closed fran-worley closed 6 years ago
Until I can find a way to save source files in '/word/media/[image-filename]'
I can't add proper support for images.
External images have the obvious drawback that they require an internet connection to load etc.
I will keep trying but if anyone has any bright ideas...
I have now got this working with internal images.
To work correctly your image source must be the full url, not just the relative path and you must provide the image size in pixels or EMs.
Size Size can be provided either via the style attribute:
<p><img src="http://placehold.it/250x100.png" style="width: 250px; height: 100px"></p>
or via data attributes
<p><img src="http://placehold.it/250x100.png" data-width="250px" data-height="100px"></p>
Should you provide both, data attributes take precedent.
Filename
The original filename of your image can either be inferred from the source:
<p><img src="http://placehold.it/250x100.png" data-width="250px" data-height="100px"></p>
would give a filename of: 250x100png
or you can provide a value via data attributes:
<p><img src="http://placehold.it/250x100.png" data-filename="what-a-lovely-image.png" data-width="250px" data-height="100px"></p>
would give a filename of: what-a-lovely-image.png
This is useful when the source url doesn't include the file extension or contains special characters.
Accessibility For accessibility you are recommended to provide titles for images via the alt attribute:
<p><img src="http://placehold.it/250x100.png" alt="Fancy image description" style="height:100px; width:250px"></p>
To do:
Great stuff @fran-worley! I'll review it during the week. Will you work on the todo or it was more like an informative todo?
I'm planning to address them with further PRs. However I would like to discuss points 2 &4 as they potentially impact the core of the gem. Also, I couldn't see any testing for document.rb which seems fairly critical give that the processing happens here.
@fran-worley I think there should be default dimensions when it's not defined in the image, or ignore and don't add it if not present. We shouldn't punish the users having images, generating empty or corrupted files, just because they haven't define a width and height.
Thanks for the feedback. I have a couple of questions...
Default Image size I agree that raising an error when no size is given is not ideal. In my opinion (happy to be wrong here...) a default image size isn't going to work as rendering any images at one size will make your document look awful. Either we don't show images at all without a size or we include a library like FastImage to calculate the size from source if the user doesn't specify one.
Supporting images in links This is more complicated as you don't appear to be able to nest relationships. I've pulled the xml that word generates when including images in links and it doesn't use relationships or hyperlink tags at all:
<w:p w14:paraId="0533D16D" w14:textId="77777777" w:rsidR="00E55D2E" w:rsidRPr="000B537C" w:rsidRDefault="000B537C">
<w:pPr>
<w:rPr>
<w:rStyle w:val="Hyperlink"/>
</w:rPr>
</w:pPr>
<w:r>
<w:rPr>
<w:noProof/>
<w:lang w:eastAsia="en-US"/>
</w:rPr>
<w:drawing><xsl:comment>Some lovely image xml</xsl:comment></w:drawing>
</w:r>
<w:r>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
<w:instrText xml:space="preserve"> HYPERLINK "http://www.example.com/" </w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="separate"/>
</w:r>
<w:r w:rsidRPr="000B537C">
<w:rPr>
<w:rStyle w:val="Hyperlink"/>
</w:rPr>
<w:t>Link Text</w:t>
</w:r>
</w:p>
I'm not sure why there is a difference in markup used by word and I can't find any documentation of this markup in the openoffice docs. If we want to support images then I'll need to rewrite the xml for links.
content types This was on my list as I didn't want to have to replace the entire file, but the files are corrupted if they don't contain the correct mimetypes.
We could open the file and inject the relevant default
tags at the end of the Types
tag as it doesn't appear to matter what order they are in.
Not tested yet but thinking something like this...
#replace current content_type code in #generate with this...
if entry.name == Document.content_types_xml_file
out.write(inject_image_content_types(entry)) if @image_files.size > 0
end
#add a method to document.rb to inject the required content_types into the file...
def inject_image_content_types(file)
doc = Nokogiri::XML(File.open(file))
#get a list of all extensions currently in content_types file
existing_exts = doc.xpath("/Default").map { |node| node.attribute("Extension") }.compact
#get a list of extensions we need for our images
required_exts = @image_files.map{ |i| i[:ext] }
#workout which required extensions are missing from the content_types file
missing_exts = required_exts - (existing_exts & required_exts)
#inject missing extensions into document
missing_exts.each do |ext|
doc.at_css("Types").add_child( "<Default Extension='#{ext}' ContentType='image/#{ext}'/>")
end
doc
end
@anitsirc Thoughts??
@anitsirc Any chance you can have a look at where I've got to. I'd love to get this merged soon...
Unrecognized unit of measure: .?
Currently you must provide a width and height in pixels or ems. You can do so either via the style or data attributes (data takes president should both be found)
If you don't provide a size or the size is in another unit (e.g percentage) you'll get an error.
@anitsirc any chance this can be merged? Would be lovely to have!
@anitsirc any problems to merge this? It would be awesome
@karnov @nickfrandsen ping ... anyone please
Hi, jumping into this conversation a little late.
On windows 7 64-bit ruby 2.3.1p112. Make a simple project:
gemfile
source 'https://rubygems.org'
gem 'htmltoword', git: 'https://github.com/fran-worley/htmltoword', branch: 'images-external'
testhtmltoword.rb
require 'htmltoword'
my_html = '<html><head></head><body><p>Hello</p></body></html>'
document = Htmltoword::Document.create(my_html)
file = Htmltoword::Document.create_and_save(my_html, 'test.docx')
running bundle exec ruby testhtmltoword.rb results in a test.docx file that cannot open in Word 2013.
The error is:
We're sorry. We can't open test.docx because we found a problem with its contents.
Details: The file is corrupt and cannot be opened.
Using the default htmltoword gem (0.5.1) the same test file produces a Word document that can be opened.
I used WinMerge to look for differences and the only difference I found was in the [Content_Types].xml file there is no data.
@stats can you attach your word document? I've just tried this and I can't open the document created from your html on either the base branch or the images branch.
test-0.6.0.docx is your branch. test-0.5.1.docx is from the karnov/htmltoword master
For some reason your branch is not including the contents for [Content_Types].xml.
After some additional testing I think there there may be a problem if no image file is included in the document. In that case you will not get any content in the [Content_Types].xml file.
Good spot @stats that should now be fixed via https://github.com/karnov/htmltoword/pull/44/commits/a12c1e45d6c0c04ea084e5af2ff0b08ba3e33ec4
Awesome, tested and works. Thank you very much.
Do note the caviats in this branch.
1) images inside links do not render 2) all images must have their size width and height in pixels or ems (not %) 3) because images have to be downloaded and saved into the document it's not the quickest and you'll probably want to generate your docs in a background process.
@karnov are you planning to merge this?
There is a small issue with current image naming functions in XSLT, that is if the source file was original.name.png
and no data-filename was provided the file will be stored as word/media/image1.png
, but in the relations file it will be referred to as:
<Relationship Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
Target="media/image1.name.png"
Id="rId8"/>
In other words the transformation takes name.png
as extension and not just png
.
Hi all,
The development around this project has been slow from our side mostly because it already fits all of our use cases. We're willing to reignite the work here and on board anyone that has already contributed to this project.
@fran-worley Sorry for the super late follow up on this.
@filipkis Good spot, I'll take a look at this and add to this PR.
@lukelex Good to here that you're looking at taking this on further if I can do anything to help get this branch merged let me know.
@fran-worley I'm not an expert in XSL so for now just let me know when you feel confident about these changes. I'll then test it with our own stuff and gladly merge it.
Hello, I'm trying to get images to work in the doc.
my tag is going like this: "
<img alt=\"\" src=\"/ckeditor_assets/pictures/1/content_equi.jpg\" style=\"height:334px; width:375px\". It seems to be in the proper syntax to the gem to work, but i'ts ignored.
@fran-worley @anitsirc Please see https://github.com/karnov/htmltoword/issues/71#issuecomment-398437050. The images won't show up in the Word files I create. Both the data-external
images as well as the internal images simply won't show up. I can generate the Word document (don't get any errors), but the images aren't there.
Your sample code: <p><img src="http://placehold.it/250x100.png" alt="Fancy image description" style="height:100px; width:250px"></p>
doesn't work either.
Could you help me with this?
@fran-worley @anitsirc Please see #71 (comment). The images won't show up in the Word files I create. Both the
data-external
images as well as the internal images simply won't show up. I can generate the Word document (don't get any errors), but the images aren't there.Your sample code:
<p><img src="http://placehold.it/250x100.png" alt="Fancy image description" style="height:100px; width:250px"></p>
doesn't work either.Could you help me with this?
Same here. I've tested back to version 0.7.0, and it's the same issue across each version. I'm not sure what would have changed to break this functionality, or when...
Adds basic support for images
Limitations:
Other changes: In order for relationship referencing to work I have appended links with Href and images with Image. Couldn’t find a better way to reference the numbers correctly.
Beginning of fix for #27