Open c-bik opened 4 years ago
@c-bik Thanks for reaching out. I don't expect we would want to add this into python-docx
directly, but perhaps it is an opportunity for a companion package like python-docx-templates
that builds on python-docx
as a dependency.
@scanny Thanks for the response. It feels like this feature alone is too small to be part of another companion package rather than a standalone package of its own. Do you think I should PR to python-docx-templates
in stead?
I don't think it fits with python-docx-templates
, I was just using that as an example of a companion package. I'm not really sure what the use case for this would be, maybe you can explain that a bit more.
I don't think it fits with python-docx-templates, I was just using that as an example of a companion package.
Ah. I got you now. So something like python-openxml/python-docx-package
?
I'm not really sure what the use case for this would be, maybe you can explain that a bit more.
I don't suppose there would be a very significant/noticeable use case coming out of this. Perhaps a small expansion of docx loading scope.
Currently docx.Document(docx=None)
can load from any path or file-like objects which seamlessly integrates with any HTTP file upload endpoint where the content is expected to be binary docx payload (zipped folder structure) etc.
However, Microsoft Add-ins for Word provides an API to get the package level single XML for entire word document (as you can see an example in OP above, <pkg:package><pkg:part>...
). At the moment there is no suitable way to plug such an XML directly (such from an HTTP POST body) into python-docx.
In such a case, what we can do now is parse the parts into some in-memory folder structure as expected in docx (open xml) format, zip it and then pass that as a stream into python-docx
. Since python-docx
is going to parse / process it again, in the PR I'm proposing, I'm trying to eliminate all that by directly accepting (also) a byte stream of such an XML through the same docx.Document(docx=None)
interface and then proceeding with the parsing a bit differently.
I must also mention that, apart from the MS/TS API I mentioned above, I don't know of any other source of such an XML.
Hi,
First off thanks a lot for this awesome library. It does a great job with most of the normal cases.
Recently, however, I came across a case (
Word.Body. getOoxml()
from Microsoft's Office-JS) where I couldn't find a way to parse/load that into a docx object directly.I am receiving an XML (MIME
application/xml
) of the following structure:It looked like a serialised from of the docx package, though I couldn't find any schema (XSD etc) for
<pkg:package>
, so I attempted to implement support directly intopython-docx
and as a result got that working for me.The work-in-progress diff of the implementation: https://github.com/python-openxml/python-docx/compare/master...c-bik:load-from-opc-ooxml-support
Usage:
Now I am wondering how should it really be done so I can potentially pull-request such a feature? If this feature is useful/interesting for a future python_docx release I can then invest some time to work on it to turn it into an acceptable PR.
Looking forward.
Best, Bikram