jpillora / node-edit-google-spreadsheet

A simple API for editing Google Spreadsheets
304 stars 101 forks source link

Native toJSON #45

Closed gnihi closed 10 years ago

gnihi commented 10 years ago

Use the toJSON Function from xmlutil. This extremly speeds up the conversion from a spreadsheet to a JSON file. We get ~300000 cells in 1 minute now.

ubik2 commented 10 years ago

This looks like a nice improvement. You may want to pass trim: false in the xmlutil parse options as well. Did you see a noticeable difference between trying to parse as JSON first and trying to parse as xml first? I imagine most users are using xml now, so if there's a noticeable performance gain, reordering seems appropriate.

gnihi commented 10 years ago

There isn't a noticeable difference in first parsing xml first. Maybe some miliseconds... But in the case that most users parse XML it's the correct way.

I reordered the statements again and added the trim: false. Besides I added a sanitize: false. The xmlutil sanitize function isn't very good so it's bedder to do it later on your own ;) Thanks for quick response and cool script!

jpillora commented 10 years ago

Hey Marc, thanks for the PR :) One request though, since I've seen reports of native extensions failing to compile (due to node-expat), would you mind swapping out xml2json with https://github.com/Leonidas-from-XIV/node-xml2js? (which depends on https://github.com/isaacs/sax-js, important thing is that it has no native dependencies). Only thing is, I'm not sure about the performance characteristics of these.

Another idea for performance, all valid XML should begin with a < (I think) so maybe use this fact instead catching the error?

P.s. I'm looking for contributors if you're interested :)

ubik2 commented 10 years ago

I decided to investigate this myself. The xml2js package doesn't have the option to coerce types (e.g. the string "47" becomes the number 47 in the javascript object), but was otherwise similar enough in API to make it work. Unfortunately, the non-native parsing of xml is very slow. For my test spreadsheet with 500 rows and 100 columns each. The JSON attempt took less than 1ms to fail, so I'm not too concerned with that.

Package Time Cells
xml2json 5.257 seconds 50000
xml2js 24.317 seconds 50000

I also tried with close to 1000 rows, but while xml2json took around 2x as long, xml2js aborted due to lack of memory.

Type coercion is about 5% of the xml2js time, and could be eliminated by adapting the code that handles results to accept string values for fields like gs:rowCount.

If you're still interested in switching the package, let me know and I can set up a PR.

One way to address the performance is to use an optionalDependency for xml2json, so if we can, we'll install and use that, but if we cannot, we will fall back to xml2js.

If you want to eyeball this without a PR: https://gist.github.com/ubik2/95d18f849b7b2878e4e0

gnihi commented 10 years ago

Hey Jaime, I don't think that I have the time to be a good contributor... But I use the scipt at work, so maybe i will find another improvement someday. I will definitely let you guys know if so! Keep up the good work :)

jpillora commented 10 years ago

@ubik2 Thanks for the performance investigation. Will merge this as is and then maybe your patch if I get pinged again about compilation errors. Created an actual issue for it here #47.

jpillora commented 10 years ago

v0.2.13 published

jpillora commented 10 years ago

@gnihi No worries and thanks for the PR

@ubik2 Keen on being a contributor :) ?