Open yjukaku opened 5 years ago
@chrahunt we need this solution from @yjukaku
👋 Is there anything holding up this PR from merging? Anything we can do to help?
This PR would solve a problem I am currently encoutering (namely: setting a bookmark in a header). I am willing to help to get this PR merged, what is holding this back?
There is a conflict file now.
@yjukaku do you have time to resolve the conflict?
So I was trying if I could get it working, I see the main difference now is that for Office365 files we have to either try document.xml
and if that does not exist, use document2.xml
.
So I created a local version, where more inline with the current code, instead of iterating over DOCUMENT_PATHS
I added explicit methods to load_headers
and footers and numbering, as we already have a load_styles
too.
But when trying to adapt the update
method accordingly, I noticed we only update the word/document.xml
regardless of the source (leaving the document2.xml
as is?) and I am not sure if that is ok/a problem? Can I ignore that for now?
I added explicit methods to load_headers and footers and numbering, as we already have a load_styles too.
I was trying to DRY the code with the DOCUMENT_PATHS
hash, but if that's not needed 🤷♂️ .
Can I ignore that for now?
I personally would expect the document file name to be the same as the original when updated. It appears the better way to find the proper document name would be to check the file [Content Types].xml
in the zip, then look for an Override
tag in that XML file that has a ContentType
attribute with the value application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml
. That will tell us exactly which file is the "main" one, and a similar method can be used for the headers, footers, numbering, styles, etc.
See http://officeopenxml.com/anatomyofOOXML.php under Content Types
Here's a sample [Content Types].xml
:
<?xml version="1.0" encoding="UTF-8"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<Override PartName="/_rels/.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Override PartName="/word/_rels/document.xml.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Override PartName="/word/settings.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml"/>
<Override PartName="/word/fontTable.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml"/>
<Override PartName="/word/document.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>
<Override PartName="/word/numbering.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.numbering+xml"/>
<Override PartName="/word/footer1.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml"/>
<Override PartName="/word/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/>
<Override PartName="/word/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml"/>
<Override PartName="/customXml/_rels/item1.xml.rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Override PartName="/customXml/itemProps1.xml" ContentType="application/vnd.openxmlformats-officedocument.customXmlProperties+xml"/>
<Override PartName="/customXml/item1.xml" ContentType="application/xml"/>
<Override PartName="/docProps/custom.xml" ContentType="application/vnd.openxmlformats-officedocument.custom-properties+xml"/>
<Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/>
<Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/>
</Types>
Can we merge this PR as well? I need access to numbering and header/footer. Thanks.
Thanks, the proposed change seems good at a high level to me. (I'm not affiliated with the project, just someone who has started using the library.) This would be helpful for one case I saw today where the important text information we wanted was in the document footer. Right now that information is inaccessible.
I wouldn't want to delay this PR, but what do you think about adding the header or footer contents to methods like .text
on documents? Maybe it could take the contents of any headers and put that at the top of the document text, and the contents of the footers at the end. That way document.text
would truly give you all of the text of the document.
Any update on this? I've been waiting for it for more than a year now.
This adds support for retrieving all of the header and footer documents embedded in the docx file, as well as the numbering docs.
This is based on the work in #22 and #42.
It also closes #49 and #32