heapwolf / cp-mux

copy files over a multiplexed stream
6 stars 0 forks source link

multiplexing #3

Open juliangruber opened 9 years ago

juliangruber commented 9 years ago

I think it would be faster to add parallelism by transfering files over multiple connections, not sending multiple files through one.

ekristen commented 9 years ago

The problem is tracking which files are going over which channel.

heapwolf commented 9 years ago

that's not a problem. we could just split the array of files into chunks and assign a chunk to each connection. very similar to how it works now, just with multiple connections.

ekristen commented 9 years ago

how does the server know how many connections to expect?

juliangruber commented 9 years ago

over every connection, the simplified protocol could be | file meta | file content |

juliangruber commented 9 years ago

no need to actually split files into chunks or to multiplex individual connections

juliangruber commented 9 years ago

and i think the number of connections we'd need to benchmark. this could actually be a really sweet QoS module

ekristen commented 9 years ago

how do you not send the same file?

btw we are using 17k files around 8mb totaling around 120gb as a test.

heapwolf commented 9 years ago

Im cool with whatever as long as its fast ;)

On Monday, March 9, 2015, Erik Kristensen notifications@github.com wrote:

how do you not send the same file?

btw we are using 17k files around 8mb totaling around 120gb as a test.

— Reply to this email directly or view it on GitHub https://github.com/hij1nx/cp-mux/issues/3#issuecomment-77909473.

Sent from Gmail Mobile

ralphtheninja commented 9 years ago

Are .sst files responding well to compressing?

ralphtheninja commented 9 years ago

That was like the worst piece of english ever. But I think you know what I mean :)

heapwolf commented 9 years ago

hahaha :) <3

ralphtheninja commented 9 years ago

Why can't the client just do a simple HTTP GET /backup? And then after getting a response continue to do HTTP GET /backup/000117.sst with like 20 connections at a time?

ralphtheninja commented 9 years ago

Or use 20 keep alive connections rather.

ekristen commented 9 years ago

I think that is what @juliangruber is working on right now.

ralphtheninja commented 9 years ago

Nice!

heapwolf commented 9 years ago

ok so @ekristen said it turned out to be about the same (1 hr). What about using this module but udp as a transport instead of tcp?

juliangruber commented 9 years ago

hmm, isn't tcp just udp with flow control and some more mechanisms to make it more reliable?

juliangruber commented 9 years ago

in the end the limit we hit probably is the max outgoing network bandwidth from the server and the max incoming network bandwidth from the client

heapwolf commented 9 years ago

@juliangruber yes, tcp trades speed for guaranteed delivery of packets in the order you send them. In this lib, I measured logging json over udp vs tcp (localhost isn't a good measurement so you need to actually set up some servers). But, I agree, I'd say what we have is "good enough" to move forward for our needs.

juliangruber commented 9 years ago

when data grows we'd probably be better off looking at automatic sharding, so there's less data to distribute