python-openxml / python-docx

Create and modify Word documents with Python
MIT License
4.49k stars 1.11k forks source link

Read information from selext box / combo box? #1197

Open Rapid1898-code opened 1 year ago

Rapid1898-code commented 1 year ago

Hello - i would like to read the information from the select box / combo box in the attached word-document - (its selected "Lisa" under JOB DETAILS at the top in the attached file)

When i try to read this information with a loop trough the table for all paragraphs with eg. this code

t = doc.tables
  for idxT,table in enumerate(doc.tables):
      for idxR, row in enumerate(table.rows):
          for idxC, cell in enumerate(row.cells):
              for idxP, para in enumerate(cell.paragraphs):    

I didn´t catch this information. Is there any way to get this information?

inp.docx

Austin1990 commented 1 year ago

You need to follow the method in this post:

python-docx get info from dropdownlist (in table) - Stack Overflow

The word\document.xml file in your document contains this structure from which you can extract the information you need :

<w:sdt>
  <w:sdtPr>
    <w:alias w:val="Entered by"/>
    <w:id w:val="1433320977"/>
    <w:dropDownList>
      <w:listItem w:displayText="Job entered by" w:value="Job entered by"/>
      <w:listItem w:displayText="Mark" w:value="Mark"/>
      <w:listItem w:displayText="Chooky" w:value="Chooky"/>
      <w:listItem w:displayText="Steve" w:value="Steve"/>
      <w:listItem w:displayText="Lisa" w:value="Lisa"/>
    </w:dropDownList>
  </w:sdtPr>
  <w:sdtContent>
    <w:r w:rsidR="004B3CEC">
      <w:t>Lisa</w:t>
    </w:r>
  </w:sdtContent>
</w:sdt>
Austin1990 commented 1 year ago

Managed to produce some sample code:

import docx
import zipfile
from bs4 import BeautifulSoup as bs

try:
    file = 'inp.docx'
    doc = docx.Document(file)
    ddl = doc.tables
    for idxT,table in enumerate(doc.tables):
        for idxR, row in enumerate(table.rows):
            for idxC, cell in enumerate(row.cells):
                for idxP, para in enumerate(cell.paragraphs):
                    print(f'para {para.text}')

    # get data from the dropdown lists
    document = zipfile.ZipFile(file)
    xml_data = document.read('word/document.xml')
    document.close()

    soup = bs(xml_data, 'xml')
    ddlists = soup.findAll('dropDownList')
    for ddl in ddlists:
        ddl_parent = ddl.parent
        ddl_parent_entries = list(ddl_parent.children)
        for ddl_parent_entry in ddl_parent_entries:
            val = ddl_parent_entry.get('w:val')
            if val and 'Entered by' in val:
                ddl_entries = list(ddl.children)
                names = []
                for entry in ddl_entries:
                    names.append(entry.get('w:displayText'))
                sibling = ddl_parent.next_sibling
                select = sibling.findNext('w:t')
                selected = select.get_text()

    print(f'List of names: {names}')
    print(f'Name selected: {selected}')

except:
    print("Error")