unidoc / unioffice

Pure go library for creating and processing Office Word (.docx), Excel (.xlsx) and Powerpoint (.pptx) documents
https://unidoc.io/unioffice/
Other
4.37k stars 473 forks source link

Add subDocument TargetMode="external" relationship support #277

Open melignus opened 5 years ago

melignus commented 5 years ago

I see that the subDoc anchor element is supported but not the http://schemas.openxmlformats.org/officeDocument/2006/relationships/subDocument relationship. I see the schema files in the repo are all tagged as generated, do not edit. Are there plans to include this relationship?

A subDoc anchor with a Relationship TargetMode="external" support is really the only barrier to completing a project that I'm working on. I'm attempting to generate a document from many artifacts with automated page references. Importing external documents as sections into the primary document hasn't worked very well because I don't have much control over the artifacts, there always seems to be some formatting issue using real artifacts from previous documents. Reading the spec it seems that the subDoc element with the external target relationship would solve my issues.

Thanks!

gunnsth commented 5 years ago

Please provide a full self-contained code snippet and an example document so this can be investigated.

melignus commented 5 years ago

_, err := document.Open("Master.docx") returns the following error 2019/05/09 13:40:01 unsupported relationship type: http://schemas.openxmlformats.org/officeDocument/2006/relationships/subDocument tgt: word/SubDoc.docx

The following documents were created in Word and work as expected when building master documents composed of sub documents as described in the ooxml spec. The subDoc anchor element appears to be supported but I'm unable to add the relationship required to make these types of composed documents work.

Master.docx SubDoc.docx

melignus commented 5 years ago

Here are the important embedded xml parts illustrating the relationship I'm hoping to be able to compose with unioffice. The subDoc anchor in the document body references the relationship with Id="rId6" in the relationship file. Its the subDocument relationship type that doesn't appear to be supported.

Master.docx -> word/document.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid wp14">
  <w:body>
    <w:p>
      <w:r>
        <w:t>Master document.</w:t>
      </w:r>
    </w:p>
    <w:p>
      <w:pPr>
        <w:sectPr>
          <w:pgSz w:w="12240" w:h="15840"/>
          <w:pgMar w:top="1134" w:right="1134" w:bottom="1134" w:left="1134" w:header="0" w:footer="0" w:gutter="0"/>
          <w:cols w:space="720"/>
          <w:formProt w:val="0"/>
        </w:sectPr>
      </w:pPr>
      <w:r>
        <w:t xml:space="preserve">Subdocument Page Reference: </w:t>
      </w:r>
      <w:r>
        <w:fldChar w:fldCharType="begin"/>
      </w:r>
      <w:r>
        <w:instrText xml:space="preserve"> PAGEREF SubdocumentBookmark \h </w:instrText>
      </w:r>
      <w:r>
        <w:fldChar w:fldCharType="separate"/>
      </w:r>
      <w:r>
        <w:rPr>
          <w:noProof/>
        </w:rPr>
        <w:t>2</w:t>
      </w:r>
      <w:r>
        <w:fldChar w:fldCharType="end"/>
      </w:r>
    </w:p>
    <w:p>
      <w:pPr>
        <w:numPr>
          <w:ilvl w:val="0"/>
          <w:numId w:val="2"/>
        </w:numPr>
      </w:pPr>
      <w:subDoc r:id="rId6"/>
    </w:p>
    <w:sectPr>
      <w:pgSz w:w="12240" w:h="15840"/>
      <w:pgMar w:top="1134" w:right="1134" w:bottom="1134" w:left="1134" w:header="0" w:footer="0" w:gutter="0"/>
      <w:cols w:space="720"/>
      <w:formProt w:val="0"/>
    </w:sectPr>
  </w:body>
</w:document>

Master.docx -> word/_rels/document.xml.rels

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
  <Relationship Id="rId8" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme" Target="theme/theme1.xml"/>
  <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml"/>
  <Relationship Id="rId7" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable" Target="fontTable.xml"/>
  <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/numbering" Target="numbering.xml"/>
  <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/customXml" Target="../customXml/item1.xml"/>
  <Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/subDocument" Target="SubDoc.docx" TargetMode="External"/>
  <Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettings" Target="webSettings.xml"/>
  <Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings" Target="settings.xml"/>
</Relationships>
gunnsth commented 4 years ago

It seems like writing/generation of documents with this subDocument relationship would be fairly easy if the subDocument is just a path/filename and no knowledge of the contents.

Reading/parsing is more tricky, as for example one would expect text extraction to go get contents within the embedded subdocument. In that case also, it would not be appropriate to access any files outside the same folder as the master docx file is in. In a way, would be preferable if the subDocuments were inside the docx bundle rather than externally on disk.