privateOmega / html-to-docx

HTML to DOCX converter
MIT License
389 stars 143 forks source link

Images in downloaded docx document all become the same as the last image #190

Open taolabz opened 1 year ago

taolabz commented 1 year ago

Issue: When there are two or more images in the content, all images will become the last image in downloaded docx file.

Code snippet:

const images = [
    '<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAACOElEQVR4Ae2UA7AcQRCG/9i2bdu2beumN7ZtWzMT27Zt27ZtXU20ddm97DO/qjGa1fDHH78D12pCsOPQN0kSnga3FYWg6b/aLnv7Dkn94emMbxYJnK5D0DosqhIIkNpSteHYBJ3FqPrh4cg0llJ3z+jdJnxHABigBEr7uaCnmNo8JhRCK6HcYdQ4DYAjnC1XZ84at1WBEZNZOt29R5BaHigE7TD57DUmNI2ui2M2SPr2XwUkXUDv/IHNQ9C0AATds7dzUIimOZ18OA6/EbRV7Vlpk6gJ9Kiw6EIj2CQI+grdxmrjz9hHTNQSQmiFjc7NG7uNkVVC4DecGAQtBrflxhRWRoVA0FH8QWpplUaGLtVmQ9AhE2FfnORQO52BbSHok+78BSZRLkAPp7nq0CVNUg8IOmVy/hhzWoTFb6Y2jAnJakA0qQ5ui2xULBLptLTQ2AN7koZWLjW/0wcugtNEywpwW0tdgu41ufcKYxtEcUnJjAFBby0ocB29qwTFb0STfKZ3J7PRLvXCIAvW14MjgtaZ3H8PbosLy0y3l2BBz5zE9Qx69w4IR6ZQBgj2zVhhNg0uQrAuTqwvDzOktsD4Db3E9t6BXZILISHonsFn++EM0SQpOH3+9502EC6G25oZWF/AQg5Jh3fPVFhdispySVd1H22CFWTj2BD0Tuf+znA1gur8SrxvmMwywyqShv9S4J4Kp2tR2c7ptL0tcWH4Iqt6L1hTuBnJSoLbksOlTGZ1VRj98cc78wMthIJSt9aMegAAAABJRU5ErkJggg==" />',
    '<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAACDklEQVR4Ae1VA6wcURR9tW23+95s7cZVUNuKaiOsu3dqB7UV1EFt27btjup2el6y+ObMFv+f5OwObuacqzyWilQ4CYs8tQwSJw3ilcIu/n6qJ5dJ4gNow8SdD1Q6Nws3DFVpBwO/QBvcYa9n6VwVlALItkIkE8TH+w1ITmVuAuLzIfLFUpW+QVPE0prEt/oN/EJMR1fEP08sUwoCLyJku9qmIlmD8+DjN/3PLZPKVGFuwKTShSFwMIKJKzp5vAzQyVsO9x/9Q3kf9/ndmQOqn172OsLwaRBszwBLFS1Cz/keR4bSJKWrRUpz2WsWARbx1oE19IvOthfVzGD5xLgIFZrtxK6/95f1rukTw+Qz5odGXODdhZCgcuSTTxQ3SNmMeAMm+yR36iuglIcDAn4jukVibqD3NpXOjJgVEWJewWjDjxMEZ04BgjX8Il8iCP0Cd5gqbyLbg5geuP8MPpFVYG7AmCQKoKxjIPIUtCPwluHjg3TVW8exc8EiT1NTFd2M8Ur5qNMshw1CndGOE1GMjHBw58W+CH03wWOy92APa7yoLk0ETkTErEGLnr+ZVjaHkwZ2ghpox8Kv4DkYWwwOkBvBnIYcLjnthso7IcNpENyN/3cxGZItS5bY8jXrbEmWAHwiT0ld5W2Queo/iJ5oVCFvxO+4ZCBuOGpAljUx/I8MJAP/vgEn+K8ZCD9S8Rsz4IeXsWZCowAAAABJRU5ErkJggg==" />',
  ];
const htmlContent = images.join("");

logger.info(htmlContent);

htmlToDocx(htmlContent, "", {
  footer: false,
}).then((blob) => {
  saveAs(
    new Blob([blob], {
      type: "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
    }),
    `test.docx`  
 );
});

Results: image

Environment: Chromium Engine Version 111.0.5563.110 macOS 13.0.1

dexter-stpierre commented 10 months ago

For anyone coming across this same issue, I fought with it for most of the afternoon. My fix ended up being that I needed to include the crypto package in my build. html-to-docx uses a package called nanoid to generate unique names for image files in the resulting docx(zip) file. This package was using the crypto node package to generate the unique id. If it isn't included it hits an error which then returns an empty string for the id every time. I use Vite, so I used the vite-plugin-node-polyfills plugin. There is an error being thrown due to the crypto package not being available, but the error is being swallowed (not quite sure where). If you used Webpack you likely don't run into this bug, as Webpack includes the necessary Node packages.

To avoid this error with a rise of build systems that don't include these packages html-to-docx could consider adding a check for the crypto package and issuing a warning to the console if it isn't there, or using a different package to generate unique file names that isn't dependent on the crypto package. Currently each image gets assigned an incrementing id, it would be pretty trivial to just use that as the unique part of the file name. I'd be happy to submit a PR if you think it would be a good solution @privateOmega