Closed fbennett closed 7 years ago
@fbennett, yes, I didn't bother once I got mammoth.js to work, but it would be a lot cleaner to just unzip the files and loop over the relevant XML elements. (would be happy to take a PR 😄)
Looks like http://stuk.github.io/jszip/ or https://gildas-lormeau.github.io/zip.js/index.html might be decent libraries for reading the files (*.odt is also just a zip file? See https://gildas-lormeau.github.io/zip.js/core-api.html#zip-reading-example in particular). And just some XPath for the XML (https://developer.mozilla.org/en-US/docs/Introduction_to_using_XPath_in_JavaScript)?
Any particular reason you're interested in supporting *.odt documents?
No pressing need, was only thinking of completeness and convenience. (To my surprise, the cite-extraction code in Juris-M still works after the migration to 5.0, so that alternative is still open to an LO user.)
and should work (with a bit of tweaking maybe?) for both .docx and .odt source files.
@simonster, can I bug you for a second? I wasn't sure if you still follow zotero-dev, and I have a question about how Zotero citation metadata is embedded in .odt files, which is probably your code. See https://groups.google.com/forum/#!topic/zotero-dev/vImXuhjsFw0
It seems like you could extract field codes directly from the exploded XML source of the document. That would save the
mammoth.js
dependency, and should work (with a bit of tweaking maybe?) for both*.docx
and*.odt
source files.