jimhigson / oboe.js

A streaming approach to JSON. Oboe.js speeds up web applications by providing parsed objects before the response completes.
http://jimhigson.github.io/oboe.js-website/index.html
Other
4.79k stars 208 forks source link

Receiving multiple done events on single GET #44

Open James-Matthew-Watson opened 10 years ago

James-Matthew-Watson commented 10 years ago

I'm using oboe and appreciate the work that went into something so essential. I'm having a very good experience with it overall but I think I'm seeing behavior that should not be occurring. I could be wrong but I think I am using it properly.

Here's my oboe.js code:

oboe({
   url: "data/" + startTime + "/" + endtime,
   method: "GET",
   headers: {contentType: "application/json"},
   cache: false
  }).node('data.*', function(element) {
    // BLAH BLAH BLAH
    $("#status").text("retrieving: " + data.length + " records " + visible.length + " displayed");
  }).done(function(data){
    $("#status").text(data.length + " records retrieved " + visible.length + " displayed");
  });

The idea is that I want a different message to display after all the data is retrieved than when I am in the middle of the stream. The response from the restful service can be a bit jerky (data, pause, data, etc.) and I've had issues with both chrome and firefox "giving up" on really long responses part way through so I want to be able to tell when the stream is truly ended.

I thought the above would work but I the message will flip back and forth many times (dozens) before the end of the stream is reached. Am I misunderstanding what triggers the done event?

thanks,

-Matt

James-Matthew-Watson commented 10 years ago

Is it something I said?

jimhigson commented 10 years ago

Ok, here's what I think is happening regarding the 'done' events...

Your JSON stream is actually many JSON objects concatenated into a file. Ie, it couldn't be read by a standard parser.

Oboe is designed to read a standard JSON resource as a stream. This means that any resource read by Oboe could also be read by standard tools.

'Multi-JSON' streams are something that I've got on the radar to add support for.

As for the pauses, I've also been thinking about this. On a fast network (or with the server running locally) and with gzipped resources the XHR can get all the content in one 'js turn'. This means that a pure js parser has a lot to parse all at once and can occupy the CPU for a noticeable amount of time.

I think the best solution is throttling so that Oboe takes many, short turns on the CPU rather than a single, longer one. I could implement this.

James-Matthew-Watson commented 10 years ago

I'm actually generating the data in question and I can change it but I'm not sure it fits your description. Here's an example. I've removed some of the fields from the internal objects for brevity:

{"data": [
{"queuedtimestamp": "2014-10-07T07:34:16.660Z", "delay": "13", "end": "2014-10-07T07:34:16.673Z"}
{"queuedtimestamp": "2014-10-07T07:34:17.200Z", "delay": "34", "end": "2014-10-07T07:34:17.411Z"}
]}

My assumption was that the pauses are actually on the source-side but it's possible that there client-side factors. The reason I say this is that I can see that it takes the root source (a 3rd-party tool) takes a long time to return the entire root data set (network latency is not a concern here.)

A few other pieces might help you understand the situation. Initially I had tried pulling the entire root dataset down and aggregating on the client. While this was OK for small sets of data, when I would try the real use cases I wanted to support, the browser would fall over and die (I've tried several they provide no specific error messages.) I then moved the aggregation to an intermediate server that will collect the data and do the aggregation server side. This improves things but I still see issues on the browser.

It could be my lack of experience in browser-side work but I have a feeling I am pushing the browser beyond it's capabilities. I am capturing as many as 500,000 events and creating say, 250,000 individual SVG elements from a single request.

So if it's not the structure of the data, is there perhaps something in oboe that is timing out because the stream appears to have stopped producing data? I'm not getting done events on every object. For example if I pull down 100,000 JSON objects in the stream, I might see 50-100 'done' events generated from oboe and they seem to align with the pauses.

I appreciate your help with this and can provide more detail as needed.

thanks,

-Matt

James-Matthew-Watson commented 10 years ago

So when request a large swath of data from the REST end-point directly int the browser and I see pauses. So it's either the browser itself or the server that the source of the pauses. It doesn't appear to be oboe.

James-Matthew-Watson commented 9 years ago

So I still have an issue here. Is there something I can provide to show that this is not a multiple item issue?

artworkad commented 9 years ago

@James-Matthew-Watson @jimhigson Having the same issue with multiple done's for one GET while streaming from nodejs. The response looks like this:

JSON

it seems that for every line of this JSON done is called.

robertsheehy-wf commented 9 years ago

+1 I'm actually running into the same docker problem as @ArtworkAD. Support for "Multi-JSON" would be awesome. Here's a dump of the type of stuff I'm trying to parse https://gist.github.com/robertsheehy-wf/0bb14c45393c94f7c976.

nhducit commented 9 years ago

@ArtworkAD

Done method is called when ever you received an complete object, so it's called many time. Redesign your api response to avoid this.

This is a future not a bug :)

kevana commented 9 years ago

I'd appreciate the feature to handle multi-json responses, redesigning the API isn't an option for everyone.

ryan-williams commented 9 years ago

I'm hitting the same issue reading from a file with multiple JSON objects concatenated in it; I understand the logic behind expecting a valid one-object JSON blob, but think that expanding Oboe to handle the multi-object case has more pros than cons.

In case it's useful, I made a simple repo demonstrating this behavior, mostly for my own understanding: https://github.com/ryan-williams/oboe-test.

Also FWIW, the multi-object "JSON" files I'm consuming are Spark's event log files.

Oboe clearly already understands that the read stream remains open after the first object is finished, and it correctly handles subsequent top-level objects, so the question seems to be whether the semantics of done should be "a top-level object is complete" vs. "the read stream is consumed".

nhducit commented 8 years ago

@ryan-williams I also have confusing when using oboe with multi-object JSON. How can I know when the request is finish?

ryan-williams commented 8 years ago

Unfortunately I think I worked around this by adding some caller code (outside of Oboe) that wrapped my JSON objects in a JSON array (and added commas between them); of course, this loses the streaming capabilities of Oboe :(

glortho commented 8 years ago

If you are able to modify your API a simple solution for this could be sending something like null just before the stream is drained and then catch it in oboe:

// back-end
myReadableStream.push( 'null' );
myReadableStream.push( null );

// front-end
oboe.node( '!', result => {
  if ( result === null ) {
    // stream is drained
  } else {
    // stream is still alive
  }
});
dcastellanos-r7 commented 8 years ago

Hey guys, great effort on the lib. I've run into a related issue I think, so I will explain what I see. I am returning an array of elements from my server. I made this array by concatenating a few files. So lets say I have an array with 40K objects. I would expect a single done event to fire when the array is completed. Instead, the done event fires multiple times. I noticed that because of my concatenation I had a new line character every time I added more objects to the array. I removed the newline characters, now I only see one done event raised once the array completes. So I think that the newlines caused a done event to be raised.

Again I don't have multiple objects sent over without a root object, I have an array of objects that should only raise 1 done event.

binarykitchen commented 7 years ago

i still experience this here - has this been solved yet?

tailuge commented 6 years ago

For ndjson stream that has many json objects separated by newlines but not contained in a top level object the following pattern worked for me:

    }).node("!", function(data) {
        all.push(data);
        itemCallback(data);
    }).on("end", function(data) {
        completeCallback(all);
    })
chris-heathwood-uoy commented 4 years ago

If your data is multi-json (also seen as ndjson or jsonl) why would you use Oboe? would it be better to just use readline and then JSON.parse() it?