ruby-docx / docx

a ruby library/gem for interacting with .docx files
MIT License
439 stars 171 forks source link

Allow access to other XML docs in docx file like the header and footer #73

Open yjukaku opened 5 years ago

yjukaku commented 5 years ago

This adds support for retrieving all of the header and footer documents embedded in the docx file, as well as the numbering docs.

This is based on the work in #22 and #42.

It also closes #49 and #32

fercreek commented 4 years ago

@chrahunt we need this solution from @yjukaku

yjukaku commented 4 years ago

👋 Is there anything holding up this PR from merging? Anything we can do to help?

nathanvda commented 3 years ago

This PR would solve a problem I am currently encoutering (namely: setting a bookmark in a header). I am willing to help to get this PR merged, what is holding this back?

satoryu commented 3 years ago

There is a conflict file now.

@yjukaku do you have time to resolve the conflict?

nathanvda commented 3 years ago

So I was trying if I could get it working, I see the main difference now is that for Office365 files we have to either try document.xml and if that does not exist, use document2.xml.

So I created a local version, where more inline with the current code, instead of iterating over DOCUMENT_PATHS I added explicit methods to load_headers and footers and numbering, as we already have a load_styles too.

But when trying to adapt the update method accordingly, I noticed we only update the word/document.xml regardless of the source (leaving the document2.xml as is?) and I am not sure if that is ok/a problem? Can I ignore that for now?

yjukaku commented 3 years ago

I added explicit methods to load_headers and footers and numbering, as we already have a load_styles too.

I was trying to DRY the code with the DOCUMENT_PATHS hash, but if that's not needed 🤷‍♂️ .

Can I ignore that for now?

I personally would expect the document file name to be the same as the original when updated. It appears the better way to find the proper document name would be to check the file [Content Types].xml in the zip, then look for an Override tag in that XML file that has a ContentType attribute with the value application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml. That will tell us exactly which file is the "main" one, and a similar method can be used for the headers, footers, numbering, styles, etc.

See http://officeopenxml.com/anatomyofOOXML.php under Content Types

Here's a sample [Content Types].xml:

<?xml version="1.0" encoding="UTF-8"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
    <Override PartName="/_rels/.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
  <Override PartName="/word/_rels/document.xml.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
  <Override PartName="/word/settings.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml"/>
  <Override PartName="/word/fontTable.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml"/>
  <Override PartName="/word/document.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>
  <Override PartName="/word/numbering.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.numbering+xml"/>
  <Override PartName="/word/footer1.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml"/>
  <Override PartName="/word/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/>
  <Override PartName="/word/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml"/>
  <Override PartName="/customXml/_rels/item1.xml.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
  <Override PartName="/customXml/itemProps1.xml" ContentType="application/vnd.openxmlformats-officedocument.customXmlProperties+xml"/>
  <Override PartName="/customXml/item1.xml" ContentType="application/xml"/>
  <Override PartName="/docProps/custom.xml" ContentType="application/vnd.openxmlformats-officedocument.custom-properties+xml"/>
  <Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/>
  <Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/>
</Types>
aunghtain commented 2 years ago

Can we merge this PR as well? I need access to numbering and header/footer. Thanks.

panozzaj commented 2 years ago

Thanks, the proposed change seems good at a high level to me. (I'm not affiliated with the project, just someone who has started using the library.) This would be helpful for one case I saw today where the important text information we wanted was in the document footer. Right now that information is inaccessible.

I wouldn't want to delay this PR, but what do you think about adding the header or footer contents to methods like .text on documents? Maybe it could take the contents of any headers and put that at the top of the document text, and the contents of the footers at the end. That way document.text would truly give you all of the text of the document.

aunghtain commented 11 months ago

Any update on this? I've been waiting for it for more than a year now.