stephen-hardy / xlsx.js

XLSX.js is a JavaScript library for converting the data in base64 XLSX files into JavaScript objects - and back! Please note that this library is licensed under the Microsoft Office Extensible File License - a license NOT approved by the OSI. While this license is based off of the MS-PL, which is OSI-approved, there are significant differences.
http://blog.innovatejs.com/?tag=xlsx-js
Other
575 stars 122 forks source link

xlsx 2.3.2: bad parsing of sharedStrings.xml on big xlsx file #31

Closed davvdg closed 8 years ago

davvdg commented 10 years ago

Hi. Just found out that I had trouble with the parsing of a big xlsx file (over 25MB) the number of items returned by parsing sharedString.xml was wrong, and even if the tables objects in xlsx were right, the data value of each cells was screwed.

I've fixed it by patching the regexp pattern from /<t.?>/g" to /.?<t.*?>/g

In the sharedString.xml in can read: uniqueCount="31243". Before patching, when loging the number of items detected from xlsx when parsing this file it says: 31270, and everything is messed up. I really mean every cells. After patching the regexp, xlsx says: 31084, which is still not the good number, but at least the cell content is ok.

It works for my case but I'm not sure if it's robust enough... Just let you know about that.

David

grassick commented 8 years ago

I believe the fix in 2.3.2 fixes it. Shares strings were being parsed wrong.