python-openxml / python-docx

Create and modify Word documents with Python
MIT License
4.57k stars 1.12k forks source link

Save the document in text format #397

Open robo3945 opened 7 years ago

robo3945 commented 7 years ago

HI everybody,

How to convert the DOCX file in TXT? Some export functions will be useful?

I use that git project but it's not reliable...

Thanks

martinmde commented 5 years ago

Load into word. "save as" close issue

jonasao commented 5 years ago

@robo3945 This library is for handling docx documents/files. If you need to create documents in text format (.txt) there are better ways.

robo3945 commented 5 years ago

I do not want to create text file, I'd like to extract the plain text from a docx. And I'd like to do via API not using Word ;)

jonasao commented 5 years ago

@robo3945 I see, you can make a loop that traverses the Document's paragraph and extracts their text-value.

Code in its simplest form below:

  document = Document('test.docx')
  txt_arr = []
  for p in document.paragraphs:
    txt_arr.append(p.text)

  # TODO: save the contents of txt_arr to a .txt file
robo3945 commented 5 years ago

it's passed a lot of time from this answer and I do not remember well, but your code works also for lists in the DOCS?

robo3945 commented 5 years ago

Does it work for bullet list too?

didmar commented 8 months ago

@robo3945 I see, you can make a loop that traverses the Document's paragraph and extracts their text-value.

Code in its simplest form below:

  document = Document('test.docx')
  txt_arr = []
  for p in document.paragraphs:
    txt_arr.append(p.text)

  # TODO: save the contents of txt_arr to a .txt file

Note that this won't work with bullets: the numbering won't be exported! This made me switching over to pypandoc.