hbons / SparkleShare

Share and collaborate by syncing with any Git repository instantly. Linux, macOS, and Windows.
https://sparkleshare.org
Other
4.88k stars 576 forks source link

Intelligent Office/Openoffice zipped XML file support #521

Closed lgordon closed 12 years ago

lgordon commented 12 years ago

MS office and Open/libreoffice files (docx, odt, etc) files are zipped directory files which contain xml files to describe the document along with any incorporated media (images, movies, etc). Since these are compressed each time they are edited this constantly takes up more and more space in the git history. It would be much more efficient to track changes to the extracted files and then let git do the compressing and revision history. This would also provide much better history tracking. You could see when images are added to the document, which portion is changed, etc.

I'm not sure how easy this would be to make work seamlessly but if you can just unzip the docx each time it changes and commit that instead of committing the zipped binary archive.

Maybe there is also a gitattributes way to do it...

more info:

http://monkeyonoracle.blogspot.com/2010/03/docx-part-i-how-to-extract-document.html

http://www-verimag.imag.fr/~moy/opendocument/

wimh commented 12 years ago

How about solving that in open/libreoffice by saving it as a "Flat XML" (.fodt, .fods, .fodp, etc.) file? See also:

And if you really want to store .docx/.odt/etc more efficient, it would make more sense to implement it in git itself. trying to fix that in SparkleShare would be just a workaround.

Also think about the other side effects, for example unzipping and rezipping a zipped office file, might not reproduce exactly the same file, so the MD5 sum changes.... If you want to prove that a document is authentic, you would want a binary identical file as it was when stored.

hbons commented 12 years ago

I agree with @wimh this would be a very complex operation and should probably be solved in git itself. We can also write in the documentation that it's best to use uncompressed office documents.