martinblech / xmltodict

Python module that makes working with XML feel like you are working with JSON
MIT License
5.47k stars 463 forks source link

ExpatError: no element found: line 5399, column 292 #227

Open tsj83 opened 4 years ago

tsj83 commented 4 years ago

Hello,

I've been able to parse way over 2,346 patent entries in a bulk xml file using xmltodict( ), so I am a big fan, but I found a few impossible to parse so far (sample attached).

Previously, I was able to go unidecode(str(this_parsed_entry)) and fix the problem of skipped patents, but it's something else this time. The error I get is ExpatError: no element found: line 5399, column 292

I looked into open tags but that doesn't seem to be the case.

Core pieces of code (be warned, not a computer scientist and I have a sense of humor):

lets = "" this_entry = lets.join(patent_entry) # patent_entry is a list this_parsed_entry = xmltodict.parse(this_entry, dict_constructor=dict) this_stringfied_parsed_entry = unidecode(str(this_parsed_entry)) temp.write(this_stringfied_parsed_entry)

The file attached resulted from me printing first patent entry (saved as txt) with file size = 0

Big thanks in advance. All the best.

entry2347.txt

skwolvie commented 3 years ago

Facing the Same error, Please update a solution to this error. Thanks!