Open JosephHardy91 opened 8 years ago
Do you have an example of a .docm file you can share? (You can just drag and drop it in the comment box in response.)
I had understood that macros were stored in a binary format, so I'm not sure how you'd be able to generate them in Python, even if you could "import" them.
But I'll take a look at the file structure and see what we see.
I would like to see this implemented as well. I've attached a sample document containing a Hello World macro.
Since Github wouldn't let me upload the document as-is, I changed the extension to ".zip".
For reference, here is the macro I put in the document:
Sub AutoOpen()
MsgBox ("Hello, World!")
End Sub
Thanks @ethack, just the ticket :)
Here's what the innards look like:
$ unzip -l MacroTest.docm Archive: MacroTest.docm
Length Date Time Name
-------- ---- ---- ----
2485 01-01-80 00:00 [Content_Types].xml
590 01-01-80 00:00 _rels/.rels
1976 01-01-80 00:00 word/_rels/document.xml.rels
1977 01-01-80 00:00 word/document.xml
1295 01-01-80 00:00 word/header3.xml
1295 01-01-80 00:00 word/footer2.xml
1295 01-01-80 00:00 word/footer1.xml
1295 01-01-80 00:00 word/header2.xml
1295 01-01-80 00:00 word/header1.xml
1675 01-01-80 00:00 word/endnotes.xml
1681 01-01-80 00:00 word/footnotes.xml
1295 01-01-80 00:00 word/footer3.xml
6795 01-01-80 00:00 word/theme/theme1.xml
277 01-01-80 00:00 word/_rels/vbaProject.bin.rels
9728 01-01-80 00:00 word/vbaProject.bin
2699 01-01-80 00:00 word/settings.xml
1367 01-01-80 00:00 word/vbaData.xml
497 01-01-80 00:00 word/webSettings.xml
29856 01-01-80 00:00 word/styles.xml
712 01-01-80 00:00 docProps/app.xml
727 01-01-80 00:00 docProps/core.xml
1261 01-01-80 00:00 word/fontTable.xml
-------- -------
72073 22 files
Note the item word/vbaProject.bin. I've had a quick look, and no surprise, but it's definitely not plain text.
The other two files of interest are vbaProject.bin.rels:
$ opc browse MacroTest.docm vbaProject.bin.rels
<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="x" Type="http://schemas.microsoft.com/office/2006/relationships/wordVbaData" Target="vbaData.xml"/>
</Relationships>
... and vbaData.xml:
$ opc browse MacroTest.docm word/vbaData.xml <?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<wne:vbaSuppData
xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:w10="urn:schemas-microsoft-com:office:word"
xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"
xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing"
xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas"
xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"
xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk"
xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape"
mc:Ignorable="w14 w15 wp14"
>
<wne:mcds>
<wne:mcd wne:macroName="PROJECT.THISDOCUMENT.AUTOOPEN" wne:name="Project.ThisDocument.AutoOpen" wne:bEncrypt="00" wne:cmg="56"/>
</wne:mcds>
</wne:vbaSuppData>
The relationships one would be straightforward, and the data one as well I suppose, assuming there's a schema or other documentation out there, or someone willing to do the reverse engineering to determine what goes in there.
But I don't know how we'd do the binary bit. I would expect it's proprietary and undocumented, but haven't really looked into it.
Do you have any idea of what functionality might be useful and how it might be implemented?
I'm still doing research into this, but thought I'd share what I have so far. It appears the format is documented (how well, I have yet to determine) and some Python tools exist to parse it already.
https://msdn.microsoft.com/en-us/library/cc313094%28v=office.12%29.aspx http://www.decalage.info/vba_tools
The functionality I'd be looking for specifically would be to pass a string that contains the raw VBA source code and have a function that inserts it into a Word document. I've long been wanting a way to programmatically generate VBA source code and insert it into a document. Personally, this is useful to me if I want to generate a bunch of documents that are similar and only differ with say a single variable in the macro code.
This is very interesting @ethack :)
From time to time the only way to discover how the MS API for Word (or PowerPoint for python-pptx) behaves is to experiment. And this is a bit of a pain given what it takes to do that when you're not running natively on Windows. We often need to do this while designing a new API element.
So it would be awesome to be able to take some plain-text code and be able to produce a test docx that runs that code once opened in Windows.
Based on what you sent, I'm thinking what we would need in python-docx (and could probably have in python-pptx as well), is a way to take an arbitrary "black-box" VBA binary and lodge it in the right place in the package, perhaps adding the right relationships and possibly the vbaData.xml. The VBA black-box could be written with one of the tools you pointed toward.
Let us know as you come up with more in your research. If we could get some kind of filthy hack or even hand-assembled file that proves the concept we'll definitely be interested in designing API support for this. We'd probably need a contributor to do the needful, so you might consider if you're willing to play that role for this one, let me know if you are :)
Hi Guys, just a request for some input if you will.
Adding support so you can open and save the .docm format (but not access or manipulate the macro-related parts) would be a relatively modest undertaking.
How much benefit do you see it that?
The likelihood that would be implemented in the "sooner" time frame is a lot higher I expect.
Personally, I don't have a need for that use case.
ok, thanks @ethack :)
Yes, this would be helpful to me. My macro is a one size fits all, so as long as I have a template file(which I do), opening a docm and saving it with macro contents intact would work perfectly well in my case. And there is also the requirement that it be editable like any docx file.
Thanks
okay, good to know, thanks @blindidiot91 :)
I'll look to scope this one out.
Hi! I'm interested in that too.
Thanks!
Hi,
I am wondering if there is any progress/decision made on this topic. I rally need this extension for the project I am working on. We have some template files and they have some macros in them, if we save them as docx then we lose the macros and they are no longer useful to us. It would be very helpful to modify the document (without modifying the macros) as scanny describes. I found this link https://github.com/python-openxml/python-docx/issues/212 where it is hinted that it might be possible to add this feature to a local copy of python-docx but I am not sure how I can implement this. Any help will be greatly appreciated. Thanks in advance.
Where do you run into a problem? Be specific and include actual error messages with trace back if any.
Hi, was this implemented ?
Not yet. Generally each case is closed as it's implemented. No one has stepped up to sponsor this one or contribute it yet.
Hi scanny,
I am quite happy to volunteer and help you implement this feature because I need it for the project I am working on. But I am an engineer and not a professional programmer, so I have to sit down and learn docx api.
So far I tried this:
imp.find_module("docx")' (None, "/home/ozgur/miniconda3/envs/dlnd/lib/python3.6/site-packages/docx", ('', '', 5))`
DOCM = ( 'application/vnd.ms-word.document.macroEnabled.12.main+xml' )
if document_part.content_type != CT.WML_DOCUMENT_MAIN:
with this one:
if document_part.content_type != CT.DOCM:
Then I went into python and typed and received the following: `>>> from docx import Document
document = Document() Traceback (most recent call last): File "
", line 1, in File "/home/ozgur/miniconda3/envs/dlnd/lib/python3.6/site-packages/docx/api.py", line 28, in Document raise ValueError(tmpl % (docx, document_part.content_type)) ValueError: file '/home/ozgur/miniconda3/envs/dlnd/lib/python3.6/site-packages/docx/templates/default.docx' is not a Word file, content type is 'application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml'`
I this point I realized that there is a 'default.docx' file, I would also have to replace that with a default.docm and then I gave up, the whole thing doesn't seem to be as simple as '3vocati' describes in here https://github.com/python-openxml/python-docx/issues/212. I realize that I would have to go deeper and really understand docx api to implement this feature.
@ozgurpolat The place to start is where the error comes from: https://github.com/python-openxml/python-docx/blob/master/docx/api.py#L26
if document_part.content_type != CT.WML_DOCUMENT_MAIN:
tmpl = "file '%s' is not a Word file, content type is '%s'"
raise ValueError(tmpl % (docx, document_part.content_type))
If you change this to:
openable_content_types = (
CT.WML_DOCUMENT_MAIN,
CT.WML_DOCUMENT_MACRO_ENABLED_MAIN
)
if document_part.content_type not in openable_content_types:
tmpl = "file '%s' is not a Word file, content type is '%s'"
raise ValueError(tmpl % (docx, document_part.content_type))
.. you'll get past the first error. The next error is something like CT.WML_DOCUMENT_MACRO_ENABLED_MAIN
undefined. So you go into constants.py and add a member to that enumeration that maps the content type url.
Finally, you need to add a mapping here so a part of this content type gets instantiated using the right (DocumentPart
) class.
https://github.com/python-openxml/python-docx/blob/master/docx/__init__.py#L29
@scanny
Thank you very much, your help is highly appreciated, it worked.
I made the changes to api.py described above and,
I added the following to constants.py:
WML_DOCUMENT_MACRO_ENABLED_MAIN = ( 'application/vnd.ms-word.document.macroEnabled.main+xml' )
and the following to init.py:
PartFactory.part_type_for[CT.WML_DOCUMENT_MACRO_ENABLED_MAIN] = DocumentPart
Glad you got it working @ozgurpolat :)
@ozgurpolat & @scanny Thank you very much 👍
Thanks both of you !
So now, we can work with .docm files ?
Thanks awesome work
Hi,
According to the comments, this issue seems to be resolved. But here is still an error, when I am trying to open .docm file:
from docx import Document
document = Document('macro_file.docm')
document.add_heading('Heading, level 1', level=1)
document.save('another_file.docm')
raises
ValueError: file 'macro_file.docm' is not a Word file, content type is 'application/vnd.ms-word.document.macroEnabled.main+xml'
Have I missed smth? Or .docm support was not implemented?
@scanny I would benefit greatly of opening, changing and writing .docm files, without touching any macro code (as I understand it, writing or changing macros is the hard part...). you removed the shortlist label - are there any problems? Is I understood, @ozgurpolat already made opening possible? What is still missing, to have this feature released?
Hi all. An update on this issue would be greatly apprecciated. I would also be able to sponsor opening/editing/closing withoug touching VBA part...if this is still a modest undertaking? @scanny @mustard007 @oezgan @ozgurpolat anyone?
It has been almost two years, i really had to scratch my head to remember this thread here. Sorry no update from me. I think I had converted the macro enabled document into a non macro word document since i was not interested in the macros. Beyond that i am afraid i cant help.
@scanny I'll try to prepare a PR, including @ozgurpolat s approach.
but if you got no time, I'll let it be, and not fill up the PR list even more :|
(as I'll find a solution this or that way ;-) thanks for the work, though, great library!
This module is great, but it doesn't currently work with docm files or macros.
Add in a way to both import macros into the VBProject and save docm files.
Currently, the only way to do this is pywin32 and if you both import a macro and save as docm for 5 files, it takes about 30 sec-1 minute which is obviously unacceptable.
Your module is perfectly positioned to take care of this problem and I'd hope to see an improvement sometime in the future.
Thanks!