nodejs / node-v0.x-archive

Moved to https://github.com/nodejs/node
34.44k stars 7.31k forks source link

Async methods for JSON.parse and JSON.stringify #7543

Closed tracker1 closed 9 years ago

tracker1 commented 10 years ago

Given that parse/stringify of large JSON objects can tie up the current thread, it would be nice to see async versions of these methods. I know that parseAsync and stringifyAsync are a bit alien in node.js, just the same it is functionality that would be best served against a thread pool in a separate thread internally.

Also exposing options for when to use a given level of fallback internally based on the size of the string/object. ex: a 300kb json takes about 12-14ms to parse holding up the main thread. This is an extreme example, but it all adds up.

indutny commented 10 years ago

JSON.parse and JSON.stringify are provided by the v8 runtime, invoking them from other threads should be possible with user-land module. Sorry, but we won't take it into core.

tracker1 commented 10 years ago

So much for "don't block the event loop"

TooTallNate commented 10 years ago

There's already a lot of 3rd party modules that do streaming json parser. There's no reason to include one into node.js itself:

$ npm search json stream parse
NAME                         DESCRIPTION                                                  AUTHOR          DATE       VERSION     KEYWORDS
baucis-json                  Baucis uses this to parse and format streams of JSON.        =wprl           2014-05-01 1.0.0-prer… baucis stream json parse parser format
clarinet                     SAX based evented streaming JSON parser in JavaScript…       =dscape =thejh  2014-02-17 0.8.1       sax json parser stream streaming event events emitter async streamer browser
clarinet-object-stream       Wrap the Clarinet JSON parser with an object stream: JSON…   =exratione      2014-02-19 0.0.3       clarinet json stream
csv-string                   PARSE and STRINGIFY for CSV strings. It's like JSON object…  =touv           2013-12-09 2.1.1       csv parser string generator
csv2json-stream              Streaming parser csv to json                                 =mrdnk          2013-03-03 0.1.2       csv2json csv json parser
dave                         JSON Hypermedia API Language stream parser for Node.js.      =isaac          2012-10-19 0.0.0       hal json hypermedia rest
dummy-streaming-array-parser Dummy Parser for streaming JSON as actual JSON Array         =floby          2013-07-05 1.0.1       streaming stream json parser
fast                         `fast` is a very small JSON over TCP messaging framework.…   =mcavage        2014-01-30 0.3.8
jaxon                        Jaxon is a sequential access, event-driven JSON parser…      =deyles         2013-08-11 0.0.3       JSON parser stream parser SAX
jlick                        Streaming configurably terminated (simple) JSON parser       =deoxxa         2013-03-14 0.0.4       json parse stream newline split whitespace
json--                       A streaming JSON parser that sometimes might be faster than… =alFReD-NSH     2012-08-21 0.0.2       JSON
json-body                    Concat stream and parse JSON                                 =tellnes        2013-08-09 0.0.0       json body parse stream concat
json-parse-stream            streaming json parser                                        =chrisdickinson 2014-04-14 0.0.2       json parse readable stream
json-parser-stream           JSON.parse transform stream                                  =nisaacson      2014-01-09 0.1.0       stream json parse
json-scrape                  scrape json from messy input streams                         =substack       2012-09-12 0.0.2       json scrape parse
json-stream                  New line-delimeted JSON parser with a stream interface       =mmalecki       2013-06-05 0.2.0
json-stream2                 JSON.parse and JSON.stringify wrapped in a node.js stream    =jasonkuhrt     2013-10-23 0.0.3       json stream
json2csv-stream              Transform stream data from json to csv                       =zemirco        2013-11-08 0.1.2       json csv stream parse json2csv convert transform
jsonparse                    This is a pure-js JSON streaming parser for node.js          =creationix…    2014-01-31 0.0.6
jsons                        Transform stream for parsing and stringifying JSON           =uggedal        2013-11-01 0.1.1       json stringify parse stream transform array
jsonsp                       JSON stream parser for Node.js.                              =jaredhanson    2012-09-09 0.2.0       json
JSONStream                   rawStream.pipe(JSONStream.parse()).pipe(streamOfObjects)     =dominictarr    2014-04-30 0.7.3       json stream streaming parser async parsing
jstream                      Continously reads in JSON and outputs Javascript objects.    =fent           2014-03-13 0.2.7       stream json parse api
jstream2                     rawStream.pipe(JSONStream.parse()).pipe(streamOfObjects)     =tjholowaychuk  2013-03-19 0.4.4
jsuck                        Streaming (optionally) newline/whitespace delimited JSON…    =deoxxa         2013-03-24 0.0.4       json parse stream newline split whitespace
kazoo                        streaming json parser with the interface of clarinet but…    =soldair        2012-10-19 0.0.0
ldjson-csv                   streaming csv to line delimited json parser                  =maxogden       2013-08-30 0.0.2
ldjson-stream                streaming line delimited json parser + serializer            =maxogden       2013-08-30 0.0.1
naptan-xml2json-parser       Takes a stream of NaPTAN xml data and transforms it to a…    =mrdnk          2012-12-17 0.0.2       NaPTAN stream parser
new-stream                   Parse and Stringify newline separated streams (including…    =forbeslindesay 2013-07-07 1.0.0
oboe                         Oboe.js reads json, giving you the objects as they are…      =joombar        2014-03-19 1.14.3      json parser stream progressive http sax event emitter async browser
parse2                       event stream style JSON parse stream                         =bhurlow        2014-04-10 0.0.1       event stream through2 through stream object stream
regex-stream                 node.js stream module to use regular expressions to parse a… =jgoodall       2012-11-20 0.0.3       regex parser stream
rfc822-json                  Parses an RFC-822 message stream (standard email) into JSON… =andrewhallagan 2014-01-27 0.3.6       email json rfc 822 message stream
stream-json                  stream-json is a collection of node.js 0.10 stream…          =elazutkin      2013-08-16 0.0.5       scanner lexer tokenizer parser
streamitems                  Simple stream parser. Emits 'item' and 'garbage'. Created…   =pureppl        2012-04-25 0.0.0
svn-log-parser               Parses SVN logs as into relevant JSON.                       =jswartwood     2012-08-02 0.2.0       svn parse stream json xml
through-json                 Through stream that parses each write as a JSON message.     =mafintosh      2014-03-20 0.1.1       stream streams2 through json parse
through-parse                parse json in a through stream, extracted from event…        =hij1nx         2013-11-24 0.1.0       through streams parse json throughstream throughstreams streaming parser
tidepool-dexcom-stream       Parse Dexcom text files into json.                           =cheddar…       2014-02-27 0.0.2       Dexcom export text json parser stream
to-string-stream             stringify binary data transform stream                       =nisaacson      2014-01-09 0.1.0       stream json parse
wormhole                     A streaming message queue system for Node.JS focused on…     =aikar          2011-09-24 3.0.0       message queue pass stream parser fast json
xmpp-ftw-item-parser         Used to parse "standard" XMPP pubsub payloads both from…     =lloydwatkin    2014-04-09 1.1.0       xmpp xmpp-ftw xml json rss atom activitystreams activitystrea.ms parse parser
tracker1 commented 10 years ago

I'm not asking for a streaming parser... I'm asking for an async parser that does the parse on the thread pool, so that it doesn't block the event loop... As it stands, many node base api services can be DDOS'd easily up by passing large json requests.

Yes, you can check request/response size, but many times you want the large-ish JSON... we were seeing issues with parsing many requests involving data from twitter for example. The long parse times are/were holding up the event loop.

This is an area for really easy DDOS, by having out of bounds parse/stringify, this should be the typical path.

TooTallNate commented 10 years ago

I see what you're saying... but the reality is that the JSON implementation in node is provided by V8, so there's really not much we can do about it. If it weren't already provided by V8 then it probably wouldn't be in node by default in the first place.

There's lots of things in JavaScript that can block the event loop, but I think being educated about it, and knowing when to use something async vs. something sync is the better way to go about it. We're not going to start adding magical sync-looking-but-really-async functions to node to try to prevent users from shooting themselves in the foot.

Personally, I'd recommend to use generators/fibers/streamline/whatever to make your async code feel sync.

TooTallNate commented 10 years ago

From a more architectural standpoint, yes, you may want the large-ish JSON object, but if it is large, you probably would want to do some kind of streaming work on it while it's on the way in, rather then buffering the entire thing into memory and then parsing it. You can architecture your app to be a lot more memory-efficient this way.

tjfontaine commented 10 years ago

I'm reopening this, so long as Node's internal IPC mechanism relies upon JSON I think it's reasonable to explore a way to provide async JSON without changing the semantics of the runtime we're relying upon. That is to say we want to use V8's JSON.parse|stringify but off-loop.

In any event, I'd entertain PRs from interested people along with some good data behind some of the costs associated with it since it won't necessarily be free.

trevnorris commented 10 years ago

The complexity of doing this isn't going to be trivial. IPC won't work because that will end up re-serializing the JSON before it hits the main thread. Also you'll need to handle security tokens of the object since the Object will need to be created in another Isolate.

It's not like you could just uv_queue_work() and pass the char* and tell v8 to parse the JSON then pass back a direct pointer to the Object's memory.

Anyways, you get the idea. TBH I like the concept, but it won't be easy.

vkurchatkin commented 10 years ago

@trevnorris is it even possible to pass objects between isolates?

indutny commented 10 years ago

@vkurchatkin this is interesting question, in fact no. But it could be possible with some relocation machinery. The question is how much work will it actually take and will there be any speed benefit, because you may be doing a lot of double work with off-thread allocation and parsing. I'm going to look into it and experiment with it.

vkurchatkin commented 10 years ago

@indutny i think maybe we can use 3rd party parser off the main thread to do the heavy lifting and then create actual v8 values on main thread from intermediate representation

YurySolovyov commented 10 years ago

Isn't it all about general discipline of working with node? If you want to do some resource consuming, you should do it either by small chunks(streaming) or in another thread. What next? async map/forEach/reduce?

bjouhier commented 10 years ago

@vkurchatkin What's costly? Is it parsing, or the construction of the result? If the cost comes primarily from the construction of the result and if we have no choice but do it in the main thread, it won't help much to offload just the parsing to a separate thread and build from an intermediate representation. My gut feeling is that parsing must be really cheap.

There are two problems here: 1) JSON.parse takes CPU in the main thread and 2) JSON.parse blocks the event loop. We are trying to solve 1. Maybe we should solve 2 instead by keeping all the processing in the main loop but yielding periodically to the event loop. Not as good but maybe good enough.

vkurchatkin commented 10 years ago

@bjouhier that's the question. My hypothesis is that actual parsing is pretty costly and this approach is beneficial for large JSONs. Also values can possibly be created lazily.

obastemur commented 10 years ago

First of all, object sharing among the Isolates not possible (because of heap memory indexes, GC levels and even optimizations). a very basic example;

var dict = {};
for(var i=X;i<Y;i++) { dict[i+""] = some_number; } 
dict = null;

for the above code V8 optimizes the operation and "may not" do much thing (since we don't use dict) and while doing this, it doesn't check anything on other Isolates. Eventually, sharing an object among the Isolates produces unexpected results.

BTW, it doesn't matter if the external JSON parser etc. is efficient or not. You will end up blocking the main thread even more.

JSON.stringify: The object subject to JSON.stringify needs to be parsed and then be transferred into an external memory block during the current thread. (heavily blocking Object:Get / Has / IsThatType or NULL / memory allocations etc) When the 'stringify' is completed, the result must be copied back to current heap. (not for free - blocks the main thread again)

JSON.parse: Parsed string needs to be filled into V8:Object . Lots of "Set"/"New" calls on the native side. Unfortunately they are very expensive. (heavily blocking) For many cases it is visibly faster if you could simply move string to JS side and parse there instead of creating a new Object on the native side. Especially when the members of that Object have long strings.

bmeck commented 10 years ago

This sounds a lot like the hilarious fibonacci benchmark, where just moving to continuable functions and process.nextTick(continuation) "solves" the problem. It does not really solve the problem, it actually slows things down.

This sounds like people want streaming parsers, but are not throttling the connection / input the the streaming parser.

Since Isolates are not able to share Objects across threads we would have to reconstruct the object manually either way... I am unsure how you would speed this up. You could remove the minimal fluff while parsing out of thread (quotes, colons, braces, brackets, commas...), but that would not really save much time since you would have to move the strings to be V8::String when you make an object. Inverse applies during stringification.

YurySolovyov commented 10 years ago

Maybe implement streaming JSON parser in node core then?

indutny commented 10 years ago

@Dream707 why? It won't make it faster.

YurySolovyov commented 10 years ago

@indutny but at least it won't block main thread

indutny commented 10 years ago

You could do non-blocking stuff in user-land too. Also it is pretty questionable, how well it'll perform. Especially considering flickering back and forth from worker thread to the main loop.

bnoordhuis commented 10 years ago

Reasons why off-loading parsing to a thread won't help much have been outlined above by @obastemur and others. I suggest closing this.

indutny commented 10 years ago

I'll let @tjfontaine decide on this one ;)

vkurchatkin commented 10 years ago

@bnoordhuis do you mean "Lots of "Set"/"New" calls on the native side." ? Because you definitely can avoid these.

bnoordhuis commented 10 years ago

That's not quite what I mean. There are two cost centers when parsing JSON: actual parsing and converting parsed input to JS values. You can farm out the first one to a worker thread but not the second one.

I'm fairly confident (having profiled it with perf in the past) that of the two, rematerializing JS values is much more expensive than parsing, doesn't matter if you're going through the V8 API or not. So while you can spend a lot of time optimizing the parser, the biggest cost center still runs inside your main thread. Never mind that the vagaries of thread scheduling means you'll add variable (as opposed to deterministic) latency to your deserializer.

(For the serializer, it's even worse because you cannot access V8 objects from outside the main thread. Ergo, it's not really possible to off-load the work.)

Having said that, here is a semi-plausible way of implementing an off-thread serializer / deserializer. The main thread would need to release the V8 isolate using a v8::Unlocker right before entering epoll_wait() and reacquire it after returning from the system call. That way, the other thread can acquire the isolate and start serializing or deserializing away.

However, that will only marginally improve matters because if the system call returns before the worker thread is finished, the main thread will block until the worker is done.

A secondary issue is that Locker and Unlocker objects are backed by a system mutex and that opens the usual can of worms about unfavorable thread rescheduling when the contention rate is high.

vkurchatkin commented 10 years ago

I don't really see how this can help, if only one thread can work with isolate. That means that no js can run in parallel with parse/stringify (not even other parse/stringify). Or am I missing something?

YurySolovyov commented 10 years ago

Original message was about to making it async, not faster.

vkurchatkin commented 10 years ago

@Dream707 actually it is about blocking less. If you just want async, you can split large json into pieces and feed it to streaming parser using setImmediate to iterate. But in total this will block even more.

YurySolovyov commented 10 years ago

How is it done with databases that not optimized for async operations? With file IO? I'm sure it is not first time this problem rises, so it should be solved somewhere already.

vkurchatkin commented 10 years ago

@Dream707 it's clear how this can be done, but the question is would it be beneficial at all. Here how I see it: the process has three phases:

The first and the third phase should be executed on v8 thread, while the second can be handed to uv_queue_work.

If we compare executing all three phases on a single thread with off loading processing to other thread, it's obviously a win (not taking uv_queue_work cost in account). When processing actually involves some blocking IO it can really take a long time, while dematerialization/materialization can be really cheap (like in case of SQL query). So, in case of integration 3rd party blocking library we've managed to make possible something that wasn't possible before and even did our best to block less. What about performance, we don't have actually something to compare with.

With JSON.parse pretty much everything is different:

  1. we don't have these three phases, so everything should be reimplement;
  2. processing doesn't involve IO, so it's possible that materialization will be more expensive, than processing;
  3. we have native implementation which uses internals and is probably really fast, so it can happen that it's faster than ad hoc materialization could ever be.
trevnorris commented 10 years ago

This has long left the realm of an issue and is now mailing list material, and as previously pointed out there are plenty of third party modules that support this. If there's no specific module that supports exactly what you're looking for then create it.

Core already has more than enough to maintain and secure outside creating crazy multi-threaded hacks.

tjfontaine commented 10 years ago

Again, I'm going to reopen this issue -- because it's functionality that I do think should exist in core, and I think it's useful for things that core currently does with JSON

tjfontaine commented 10 years ago

Sorry, accidentally clicked the button before finishing --

The point of this is a tracking issue to encourage people to submit a PR that demonstrates the implementation and usage for Node core.

There are multiple reasons to want this for core and to export an interface for users that is canonical for them to rely upon.

It's a difficult line to walk between what belongs in core and what belongs in user land. But one of the easiest ways to identify that is if Node core itself could make use of that functionality.

So while there are many opinions here, and many observations about what the ramifications are of having that pattern in core -- there hasn't been anything brought to the table yet to comment on and iterate on.

But this tracking issue is staying open until it has been fully explored for use in core.

trevnorris commented 10 years ago

What goes in core or not, and how it gets implemented, may be up for discussion, but discussing this is for the mailing list. Issues should be reserved for the implementation details once a consensus is reached about what should be implemented.

bjouhier commented 10 years ago

Doesn't really matter to me whether this is in core or in user-land but it would be nice to have it. First question is what?

  1. do we want to offload processing to a worker thread?
  2. do we just want to make parsing interruptible to avoid blocking the event loop?

From the exchanges above (and I'm completely following Ben on the fact that pure parsing part should be the cheap part), option 1 is a very difficult path.

Option 2 is much easier. We could do it with an API like:

var parser = new JSON.Parser();
parser.update(data); // updates the state of the parser with more data
var result = parser.result();

This is completely synchronous and everything is done in the main thread but it allows you to update the parser from data events. Then, when you receive the end event you can retrieve the result.

Wouldn't this be sufficient?

bjouhier commented 10 years ago

Refinement would be to have the parser be an EventEmitter. Then you could subscribe to the production of specific JSON nodes with parser.on(pathToNode, fn)

vkurchatkin commented 10 years ago

@trevnorris nodejs-dev mailing list is dead, isn't it? (https://groups.google.com/forum/#!topic/nodejs-dev/FhFhw5z48UQ)

trevnorris commented 10 years ago

@trevnorris nodejs-dev mailing list is dead, isn't it? ( https://groups.google.com/forum/#!topic/nodejs-dev/FhFhw5z48UQ)

Well, not to be rude, but Isaac isn't the lead maintainer anymore (not that I am either).

Creating long threads for discussion of what should/shouldn't be implemented, when there hasn't been a clear consensus on what it is that needs implementing, IMO bloats the issue tracker with topics that have no direct guidelines for what's required to close the issue.

An issue should be more clear cut on what's required for the issue to be closed.

trevnorris commented 10 years ago

do we want to offload processing to a worker thread?

Please excuse my ignorance, but I'm really missing how this is useful. Are you planning on doing a full parse off the main thread then serialize smaller sets of data to sent back to the main thread to be re-parsed?

You might as well implement a fast scanner that can interpret the JSON and index each sub-object. Then use string slice to parse each chunk. While it would be possible to offload the indexing of the JSON to a worker thread, the indeterministic nature of doing so means it's highly unlikely to have any performance improvements.

do we just want to make parsing interruptible to avoid blocking the event loop?

IIRC V8 does offer an API that allows users to interrupt long running scripts so the user can act. But it would be impossible to store the stack of currently parsed data so it could be resumed later.

bjouhier commented 10 years ago

I did a quick implementation of the parser.update(chunk) idea in pure JS. It is 2.65 times slower than native JSON.parse when parsing in one shot and about 2.8 times slower when parsing incrementally: https://github.com/bjouhier/i-json

A C++ implementation can probably reduce the gap with JSON.parse but I don't have any clue of what can be gained.

Note: implementation still needs work (unicode escape sequences, stricter syntax checking, no profiling yet). I just wanted to get a feel for performance.

bjouhier commented 10 years ago

@trevnorris My previous comment was an attempt to re-position the problem. To clarify I don't think that trying to offload parsing to another thread is a good idea.

On the other hand, a parser that you can update incrementally from data events and that can deliver the result without blocking seems more useful.

vkurchatkin commented 10 years ago

@bjouhier this problem is solved by few packages (https://github.com/creationix/jsonparse, https://github.com/jimhigson/oboe.js). Such approach makes sense if you actually have a readable stream, from both memory and less blocking standpoints. But manually splitting a buffer into chunks and iterating asynchronously is lame.

bjouhier commented 10 years ago

@vkurchatkin You're right, jsonparse does exactly what I described earlier (did not know about it). But i-json is a lot faster than jsonparse on my test file (1660 ms vs 8100 ms).

If we could have something like jsonparse / i-json with performance that's very close to native JSON.parse we could handle lots of JSON parsing scenarios without blocking the event loop and with less memory overhead than today. Normally the data comes in small chunks and that's what you want to handle without blocking. Why worry about the case were the whole string is already in memory? As you say, it can be handled by splitting and iterating asynchronously and that's lame. But what's lame is not the parsing in this case, it is the fact that the whole JSON text has been accumulated in memory in the first place. If you want to process a large JSON feed, for ex a log stream, you want to update the parser as you receive chunks (and discard the chunks) and you want the parser to emit an event every time it has recognized a complete top level entry in your feed. That's exactly what jsonparse does, and that's what I want to do with i-json (I did not implement the emitting part yet but that's easy to add).

Delegating JSON parsing to a worker thread seems to be a very challenging problem. From the previous discussions, it does not seem like it will help much because V8 more or less imposes that you materialize the result from the main thread. And materialization is the costly part. I did a bit of timing with i-json: parsing only accounts for 300 ms of the 1660 ms, the rest is materialization. Also, the 300 JS parsing time is less than half the total time taken by native JSON.parse (622 ms).

bjouhier commented 10 years ago

Hi rektide,

I added clarinet to the test. It is faster than jsonparse but it does not materialize the result (it just emit events) so the test is unfair. It is about 3 times slower than i-json (which materializes the result).

JSON parsing is an area where I'm not ready to trade pure perf for fancy features. So what I'm investigating here is how close we can get to native JSON.parse with an incremental parser. With i-json, I get 2.5 ~ 3 times slower. clarinet and jsonparse are more around 8 ~ 13 times slower. Looks like the next step will be C++.

The output of my test program ( https://github.com/bjouhier/i-json/blob/master/test/test.js)

* PASS 1 * JSON: 626 ms I-JSON single chunk: 1730 ms SAME RESULTS! I-JSON multiple chunks: 1813 ms SAME RESULTS! jsonparse single chunk: 8448 ms DIFFERENT RESULTS! 63: a1= "latitude": -77.373901, 63: a2= "latitude": -77.37390099999999, clarinet single chunk: 5045 ms clarinet does not materialize result, time is for parsing only * PASS 2 * JSON: 478 ms I-JSON single chunk: 1569 ms SAME RESULTS! I-JSON multiple chunks: 2188 ms SAME RESULTS! jsonparse single chunk: 8329 ms DIFFERENT RESULTS! 63: a1= "latitude": -77.373901, 63: a2= "latitude": -77.37390099999999, clarinet single chunk: 5035 ms clarinet does not materialize result, time is for parsing only * PASS 3 * JSON: 753 ms I-JSON single chunk: 1511 ms SAME RESULTS! I-JSON multiple chunks: 1946 ms SAME RESULTS! jsonparse single chunk: 8527 ms DIFFERENT RESULTS! 63: a1= "latitude": -77.373901, 63: a2= "latitude": -77.37390099999999, clarinet single chunk: 5016 ms clarinet does not materialize result, time is for parsing only

2014-05-25 0:08 GMT+02:00 rektide notifications@github.com:

https://github.com/dscape/clarinet has been my goto SAX-inspired incremental json reader, if we're chipping in. got here by way of a recent tweet from @bjouhier https://github.com/bjouhier mentioning he'd started another incremental parser: https://github.com/bjouhier/i-json

— Reply to this email directly or view it on GitHubhttps://github.com/joyent/node/issues/7543#issuecomment-44101100 .

bmeck commented 10 years ago

We should be clear if we want to bias toward strings or buffers here. Strings cannot represent incomplete code points at the start / end of the string. Buffers can and are most likely more common but would mean a slightly different API.

vkurchatkin commented 10 years ago

@bmeck I think that parser should operate on buffers directly. It looks like an easy performance improvement, but also brings a lot of complexity, like encoding-specific code, weird split code point states, etc. I can't see how it affects API, though.

bjouhier commented 10 years ago

@bmeck. Good point about buffers. Parsing incrementally from a utf-8 buffer is not much more difficult than parsing incrementally from a string.

bjouhier commented 10 years ago

I just pushed a buffer branch of my little experimental parser. No real impact on code complexity but a significant impact on performance (2600 ms instead of 1600 ms in my test). This needs to be balanced with the fact that it saves a buffer to string conversion beforehand. But still, that's a big hit.

Another reason to move to C++.

trevnorris commented 10 years ago

Another reason to move to C++.

Just curious. What part of this would possibly be faster if moved to C++?

bjouhier commented 10 years ago

First, it should make it easy to bring the Buffer implementation on par with the string one but I also see a number of micro optimizations: eliminate bounds checking in the automata, allocation of state frames on a free list, etc. But the proof of the pudding is in the eating!

bjouhier commented 10 years ago

FWIW, I made good progress on a C++ implementation of the incremental parser. Typical run output:

*** PASS 1 ***
JSON.parse: 622 ms
I-JSON single chunk: 1038 ms
I-JSON multiple chunks: 1305 ms
*** PASS 2 ***
JSON.parse: 640 ms
I-JSON single chunk: 1030 ms
I-JSON multiple chunks: 1285 ms
*** PASS 3 ***
JSON.parse: 629 ms
I-JSON single chunk: 1056 ms
I-JSON multiple chunks: 1305 ms
*** PASS 4 ***
JSON.parse: 626 ms
I-JSON single chunk: 1046 ms
I-JSON multiple chunks: 1289 ms
*** PASS 5 ***
JSON.parse: 665 ms
I-JSON single chunk: 1014 ms
I-JSON multiple chunks: 1349 ms

So on average i-json is 1.63 times slower than JSON.parse on a full parse and 2.05 times slower on an incremental parse. This is significantly better than the JS implementation (was 2.65 and 2.8 times slower).

Getting faster is starting to be challenging because JSON.parse is using internal V8 functions to build objects and optimize the allocation of ascii-only strings. The allocation of strings/objects/arrays and the assignments to array/object slots account for 66% of the overall processing time in the i-json C++ implementation (with my test data). The allocation of strings alone account for 38% of the time.