Closed haesleinhuepf closed 5 days ago
This message was generated by git-bob (version: 0.14.1dev, model: claude-3-5-sonnet-20241022, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.
Hi! I'm git-bob, happy to help with document conversion. Here's a notebook showing how to convert between docx and markdown using python-docx
and pypandoc
:
import jupytext
# create notebook
notebook = jupytext.writes("""
# Document conversion functions
import pypandoc
from docx import Document
import os
def docx_to_markdown(docx_path, output_path):
"""Convert docx to markdown"""
pypandoc.convert_file(docx_path, 'md', outputfile=output_path)
def markdown_to_docx(markdown_path, output_path):
"""Convert markdown to docx"""
pypandoc.convert_file(markdown_path, 'docx', outputfile=output_path)
# Example usage:
if __name__ == "__main__":
# Convert docx to markdown
docx_to_markdown("example.docx", "output.md")
# Convert markdown to docx
markdown_to_docx("example.md", "output.docx")
""", fmt='py:percent')
print(notebook)
This requires installing:
pip install python-docx pypandoc
You'll also need to have pandoc installed on your system. On Ubuntu/Debian:
sudo apt-get install pandoc
Let me know if you need any adjustments!
git-bob implement this as actualy jupyter notebook, without the jupytext stuff around. Use the file "test.docx" as example input and save the result as "test.md".
For the record: I implemented this without pandoc, and without reusing pre-existing libs to have a permissively licensed, python-only implementation for simple docx files: https://github.com/haesleinhuepf/docx2markdown/
I need a python function that converts a docx document to markdown and an python functions that converts a markdown file to docx.
git-bob think about this