Open Rapid1898-code opened 1 year ago
You need to follow the method in this post:
python-docx get info from dropdownlist (in table) - Stack Overflow
The word\document.xml file in your document contains this structure from which you can extract the information you need :
<w:sdt>
<w:sdtPr>
<w:alias w:val="Entered by"/>
<w:id w:val="1433320977"/>
<w:dropDownList>
<w:listItem w:displayText="Job entered by" w:value="Job entered by"/>
<w:listItem w:displayText="Mark" w:value="Mark"/>
<w:listItem w:displayText="Chooky" w:value="Chooky"/>
<w:listItem w:displayText="Steve" w:value="Steve"/>
<w:listItem w:displayText="Lisa" w:value="Lisa"/>
</w:dropDownList>
</w:sdtPr>
<w:sdtContent>
<w:r w:rsidR="004B3CEC">
<w:t>Lisa</w:t>
</w:r>
</w:sdtContent>
</w:sdt>
Managed to produce some sample code:
import docx
import zipfile
from bs4 import BeautifulSoup as bs
try:
file = 'inp.docx'
doc = docx.Document(file)
ddl = doc.tables
for idxT,table in enumerate(doc.tables):
for idxR, row in enumerate(table.rows):
for idxC, cell in enumerate(row.cells):
for idxP, para in enumerate(cell.paragraphs):
print(f'para {para.text}')
# get data from the dropdown lists
document = zipfile.ZipFile(file)
xml_data = document.read('word/document.xml')
document.close()
soup = bs(xml_data, 'xml')
ddlists = soup.findAll('dropDownList')
for ddl in ddlists:
ddl_parent = ddl.parent
ddl_parent_entries = list(ddl_parent.children)
for ddl_parent_entry in ddl_parent_entries:
val = ddl_parent_entry.get('w:val')
if val and 'Entered by' in val:
ddl_entries = list(ddl.children)
names = []
for entry in ddl_entries:
names.append(entry.get('w:displayText'))
sibling = ddl_parent.next_sibling
select = sibling.findNext('w:t')
selected = select.get_text()
print(f'List of names: {names}')
print(f'Name selected: {selected}')
except:
print("Error")
Hello - i would like to read the information from the select box / combo box in the attached word-document - (its selected "Lisa" under JOB DETAILS at the top in the attached file)
When i try to read this information with a loop trough the table for all paragraphs with eg. this code
I didn´t catch this information. Is there any way to get this information?
inp.docx