Handle XML namespaces in worksheets

pythonicrubyist / creek

Ruby library for parsing large Excel files.

MIT License

386 stars 109 forks source link

We've run into an issue with parsing an XLSX when the nodes are namespaced (e.g. <x:row>).

~This PR addresses that issue by using the local_name method when looking for row, c, v and t nodes. The name method includes the namespace, e.g. x:row, but local_name will strip the namespace prefix, allowing the existing comparison logic to work.~

This PR addresses that issue by identifying the namespace prefix (if there is one) while SAX parsing the sheet and looking for nodes whose name includes the prefix.

Additionally, when the shared strings dictionary is built, this PR identifies the namespace prefix (if there is one) and includes the namespace in the CSS query used to parse the dictionary. An alternative approach would be to call remove_namespaces! on the document, but that seems a bit heavy handed.

pythonicrubyist / creek

Handle XML namespaces in worksheets #101