client.createReadableStream()

brianc commented 10 years ago

Hello - thanks again for the nice work on multilevel. Works great for the most part, and is making my life much easier. I recently ran into an issue with the createReadableStream().

My scenario: I have a 60 gig level database with about 1.5m keys. In one process (my master process) I am traversing through the database and doing a 5-6 second operation on each item in batches of 10 or so and updating those items...so a createReadableStream() piped into a number of in process 'workers', each of which are calling db.put() when they're done. In another process I am using multilevel to get a readable stream to my master process database and doing a different set of operations using the another set of workers. The output of the client process is into a separate system, so the multilevel client process does not call anything other than a single multilevel.createReadableStream() when it boots up.

I have run into a few issues specifically related to the readable stream I'd like to bring up...I'm probably missing something in how I'm using it.

When the client connects to the master process and creates the readable stream the master process CPU usage spikes to 100% from around 30%. I assume this is because the master is reading through the first part of the database and pushing it into the network socket (via rpc-stream, via mux-demux) to fill the buffer. When I then disconnect the client process, the master process CPU usage does not return to normal; rather, it remains pegged at 100% CPU for a few minutes...after this amount of time I generally grow tired of waiting and restart the master process. I'm not sure what the cause of this is exactly, or if I waited long enough the CPU usage would return to normal. What I would expect to happen is perhaps an initial burst of CPU and then a return to normalcy as the client stream starts to exert back-pressure on the master. At the very least I would expect CPU usage to return to initial state when the client disconnects completely. Perhaps the iterator is left open and burning CPU until it reaches the end of the requested range (in my case, the entire 1.5m keyset)
More troubling than point 1 actually is that memory usage grows unbounded within the master process and eventually the master is killed by ubuntu's OOM monster. If I create a readable stream in the client with bounds of say the first 10-50k keys it works fine, but if I say "Just start iterating through the database until you reach the end" I always run out of memory. My server code is basically a copy/paste from the example in the readme here, so nothing special about it.

I apologize if my examples are kinda hard to understand...I spent the better part of Wednesday thrashing on this and fiddling various things very unscientifically without taking extremely detailed notes. I eventually came to the conclusion something in the multilevel "stack" is either buffering endlessly or being very inefficient with allocations. As a temporary solution I created a module to handle streaming large result-sets from leveldb more efficiently. It only works with leveldb because it uses leveldown to avoid turning records from binary into objects and directly back into binary to go out over the network. I've dropped it in as an addition to multilevel, and CPU usage is down from 100% pegged when a client is connected to barely increasing at all, the client can read rows much more quickly, and when the client connects the server CPU usage returns to an idle state.

https://github.com/brianc/node-level-readable

I think the main differences are instead of using the rpc-stream it uses a custom, light-weight binary protocol (based on the PostgreSQL client/server protocol), avoids object allocations on the server, and uses's node's stream2 stuff internally to respond to back-pressure the best I can make it. I'd be happy to try to work this into a pull request instead of having it be a separate thing...but it is specific directly to level-down (using db.iterator()) and doesn't conform to the rest of the library using rpc-stream so I'm not sure how you feel about that.

It's not battle hardened yet by any means, and lacks documentation because its not quite ready yet, but some folks on the ##leveldb irc suggested I drop by and talk about it a bit. It has proved to be faster, use less memory and less CPU. It's completely not general purpose like multilevel, but thought maybe it could be useful.

Thanks again and sorry for the ramblings!

juliangruber commented 10 years ago

hmm ok this sucks. Can you provide a gist that's similar enough to your example? Then I'll use trace to see what's really going on.

juliangruber commented 10 years ago

Why does level up turn buffers into objects? Are you using valueEncoding:'binary'?

juliangruber commented 10 years ago

And thanks a lot for bringing this issue here!

dominictarr commented 10 years ago

I have an idea: so streams are created here: https://github.com/juliangruber/multilevel/blob/master/lib/server.js#L80-L82

now, see they are just piped directly. however, when the client pauses the stream, that message has to propagate back across the network... which means there is a considerable lag compared to being in the same process. my theory is that level starts streaming, to the client, but the client never gets a chance to tell the server to slow down (not fast enough, at least). so level doesn't know that you don't want to read the entire database at once, so it DOES read the whole database.

If this is correct, then the solution is to use a similar method as tcp. exponential backoff. You would drop another stream in (or something) that automatically finds the right rate of flow. it would allow a bit of data to go through, then pause and wait a bit, if I didn't hear any pause message (actually, maybe there should be an ack) then it would speedup a little, but if it gets a pause then it slows down alot. (speed up a little = speed up linearly, slow down a lot = exponentially)

hmm, mux-demux might need a change to implement this, but it would be quite simple. it would just need to send a message to the other side on the "drain" event. If the drain event comes early (before the waiting time is over) then speed up. if the drain event comes much later, then you need to slow down. (you could also do this with pull/streams2 semantics, but TCP doesn't)

I wanted to have this when I first wrote mux-demux, but didn't know enough to write that back then.

But first we should confirm this is the problem at hand. I would start by putting logging statements into multilevel, so you can see how many items the client reads before it pauses, and how many items the database reads before it pauses.

dstokes commented 10 years ago

i'm seeing a similar issue with regards to point 2 of this issue. i have a client that opens several read streams over the same rpc channel concurrently. during high load, the server process CPU spikes and memory leaks. over the course of 8 days, my multilevel server process consumes all the RAM and is killed by OOM. i had a theory that multiple read streams were buffering on the server due to the busy rpc stream connection, but based on @dominictarr's comment, it sounds like that might not be the case.

russtee commented 9 years ago

I'm seeing the same issue. We have a 6G leveldb with about 12M keys.

A helper script that we use streams the entire database into a custom transform stream in order to tidy things up. If we run it as is the server runs at 100% CPU and consumes memory until it gets killed.

Our solution is to batch things up but the extra code isn't pretty and we're losing out on the elegance of nodes streams. It does seem like the need for back pressure is not getting through.

juliangruber commented 9 years ago

ok can someone help the maintainers by creating a script / gist that shows exactly this problem?

juliangruber / multilevel

client.createReadableStream() #44