Open James-Matthew-Watson opened 10 years ago
Is it something I said?
Ok, here's what I think is happening regarding the 'done' events...
Your JSON stream is actually many JSON objects concatenated into a file. Ie, it couldn't be read by a standard parser.
Oboe is designed to read a standard JSON resource as a stream. This means that any resource read by Oboe could also be read by standard tools.
'Multi-JSON' streams are something that I've got on the radar to add support for.
As for the pauses, I've also been thinking about this. On a fast network (or with the server running locally) and with gzipped resources the XHR can get all the content in one 'js turn'. This means that a pure js parser has a lot to parse all at once and can occupy the CPU for a noticeable amount of time.
I think the best solution is throttling so that Oboe takes many, short turns on the CPU rather than a single, longer one. I could implement this.
I'm actually generating the data in question and I can change it but I'm not sure it fits your description. Here's an example. I've removed some of the fields from the internal objects for brevity:
{"data": [ {"queuedtimestamp": "2014-10-07T07:34:16.660Z", "delay": "13", "end": "2014-10-07T07:34:16.673Z"} {"queuedtimestamp": "2014-10-07T07:34:17.200Z", "delay": "34", "end": "2014-10-07T07:34:17.411Z"} ]}
My assumption was that the pauses are actually on the source-side but it's possible that there client-side factors. The reason I say this is that I can see that it takes the root source (a 3rd-party tool) takes a long time to return the entire root data set (network latency is not a concern here.)
A few other pieces might help you understand the situation. Initially I had tried pulling the entire root dataset down and aggregating on the client. While this was OK for small sets of data, when I would try the real use cases I wanted to support, the browser would fall over and die (I've tried several they provide no specific error messages.) I then moved the aggregation to an intermediate server that will collect the data and do the aggregation server side. This improves things but I still see issues on the browser.
It could be my lack of experience in browser-side work but I have a feeling I am pushing the browser beyond it's capabilities. I am capturing as many as 500,000 events and creating say, 250,000 individual SVG elements from a single request.
So if it's not the structure of the data, is there perhaps something in oboe that is timing out because the stream appears to have stopped producing data? I'm not getting done events on every object. For example if I pull down 100,000 JSON objects in the stream, I might see 50-100 'done' events generated from oboe and they seem to align with the pauses.
I appreciate your help with this and can provide more detail as needed.
thanks,
-Matt
So when request a large swath of data from the REST end-point directly int the browser and I see pauses. So it's either the browser itself or the server that the source of the pauses. It doesn't appear to be oboe.
So I still have an issue here. Is there something I can provide to show that this is not a multiple item issue?
@James-Matthew-Watson @jimhigson Having the same issue with multiple done's for one GET while streaming from nodejs. The response looks like this:
it seems that for every line of this JSON done
is called.
+1 I'm actually running into the same docker problem as @ArtworkAD. Support for "Multi-JSON" would be awesome. Here's a dump of the type of stuff I'm trying to parse https://gist.github.com/robertsheehy-wf/0bb14c45393c94f7c976.
@ArtworkAD
Done method is called when ever you received an complete object, so it's called many time. Redesign your api response to avoid this.
This is a future not a bug :)
I'd appreciate the feature to handle multi-json responses, redesigning the API isn't an option for everyone.
I'm hitting the same issue reading from a file with multiple JSON objects concatenated in it; I understand the logic behind expecting a valid one-object JSON blob, but think that expanding Oboe to handle the multi-object case has more pros than cons.
In case it's useful, I made a simple repo demonstrating this behavior, mostly for my own understanding: https://github.com/ryan-williams/oboe-test.
Also FWIW, the multi-object "JSON" files I'm consuming are Spark's event log files.
Oboe clearly already understands that the read stream remains open after the first object is finished, and it correctly handles subsequent top-level objects, so the question seems to be whether the semantics of done
should be "a top-level object is complete" vs. "the read stream is consumed".
@ryan-williams I also have confusing when using oboe with multi-object JSON. How can I know when the request is finish?
Unfortunately I think I worked around this by adding some caller code (outside of Oboe) that wrapped my JSON objects in a JSON array (and added commas between them); of course, this loses the streaming capabilities of Oboe :(
If you are able to modify your API a simple solution for this could be sending something like null
just before the stream is drained and then catch it in oboe:
// back-end
myReadableStream.push( 'null' );
myReadableStream.push( null );
// front-end
oboe.node( '!', result => {
if ( result === null ) {
// stream is drained
} else {
// stream is still alive
}
});
Hey guys, great effort on the lib. I've run into a related issue I think, so I will explain what I see. I am returning an array of elements from my server. I made this array by concatenating a few files. So lets say I have an array with 40K objects. I would expect a single done event to fire when the array is completed. Instead, the done event fires multiple times. I noticed that because of my concatenation I had a new line character every time I added more objects to the array. I removed the newline characters, now I only see one done event raised once the array completes. So I think that the newlines caused a done event to be raised.
Again I don't have multiple objects sent over without a root object, I have an array of objects that should only raise 1 done event.
i still experience this here - has this been solved yet?
For ndjson stream that has many json objects separated by newlines but not contained in a top level object the following pattern worked for me:
}).node("!", function(data) {
all.push(data);
itemCallback(data);
}).on("end", function(data) {
completeCallback(all);
})
I'm using oboe and appreciate the work that went into something so essential. I'm having a very good experience with it overall but I think I'm seeing behavior that should not be occurring. I could be wrong but I think I am using it properly.
Here's my oboe.js code:
The idea is that I want a different message to display after all the data is retrieved than when I am in the middle of the stream. The response from the restful service can be a bit jerky (data, pause, data, etc.) and I've had issues with both chrome and firefox "giving up" on really long responses part way through so I want to be able to tell when the stream is truly ended.
I thought the above would work but I the message will flip back and forth many times (dozens) before the end of the stream is reached. Am I misunderstanding what triggers the done event?
thanks,
-Matt