nyphilarchive / PerformanceHistory

New York Philharmonic Performance History Metadata
Creative Commons Zero v1.0 Universal
130 stars 26 forks source link

Nesting Bug #13

Closed nyphil closed 7 years ago

nyphil commented 8 years ago

The current reformat process breaks at the reformat_xml script and the fields aren't getting nested after the xlst. @hamlet82, could you help spot the bug? And this may affect the JSON converter written by @freethejazz.

nyphil commented 8 years ago

Here is the command output when reformat_xml.py is executed: Traceback (most recent call last): File "...reformat_xml.py", line 122, in sortWorksInfo(p.find('worksInfo')) File "...reformat_xml.py", line 96, in sortWorksInfo if movement[x].text: IndexError: list index out of range

hamlet82 commented 8 years ago

I'm sorry, I can't focus enough to change my code to fix this. The best solution would probably be to rewrite your output code so that it nests the XML in my formatting from the get-go.

nyphil commented 8 years ago

We took a pretty big swing at the (complicated) underlying data workflow; this error was being caused by input XML elements not lining up, due to either bad or unusual metadata records. We cleaned up some records in the database of record; edited the XSLT that pre-cleans the XML in order to make the data going into this script more regular; and added a test to the Python script to avoid a trouble spot. Locally, it appears to work, and it hasn't changed the data structures. It also allows the JSON script to work again. I'll be uploading the updated files and data soon.