onizet / html2openxml

Html2OpenXml is a small .Net library that convert simple or advanced HTML to plain OpenXml components. This program has started in 2009, initially to convert user's comments from SharePoint to Word.
MIT License
297 stars 106 forks source link

Invalid document when html contains images and existing headers/footers contain images #113

Open toneb opened 2 years ago

toneb commented 2 years ago

Word reports "unreadable content" when document contains header/footer with images and inserted html contains images.

The problem is in duplicated DocProperties.Id: https://github.com/onizet/html2openxml/blob/09d064c3b8824562f23c54881bdba1b144216de8/src/Html2OpenXml/HtmlConverter.cs#L523

Calculation of drawingObjId in above method considers only elements in document body, not other parts of the document.

My workaround is to renumber generated prop ids (but considering elements in other document parts):

// get existing max docProp id
var maxDocPropId = new[] {
    doc.MainDocumentPart!.Document.Body!.Descendants<DocProperties>().Select(x => x.Id?.Value ?? 0),
    doc.MainDocumentPart!.HeaderParts.SelectMany(x => x.Header.Descendants<DocProperties>().Select(x => x.Id?.Value ?? 0)),
    doc.MainDocumentPart!.FooterParts.SelectMany(x => x.Footer.Descendants<DocProperties>().Select(x => x.Id?.Value ?? 0)),
}.SelectMany(x => x).Max();

// convert html to openxml
var converter = new HtmlConverter(doc.MainDocumentPart);
var parsed = converter .Parse(html);

// renumber docProp ids
parsed.SelectMany(x => x.Descendants<DocProperties>()).ToList().ForEach(x => x.Id = ++docPropCopy);

// ... use generated elements ...