taskgraph / mr

1 stars 1 forks source link

About the communication between user program and framework #2

Open plutoshe opened 9 years ago

plutoshe commented 9 years ago

After testing a example, I found a problem of current mapreduce implementation. The communication between user grpc server and framework is so slow. For example, 1G data only takes 30s to read and write in local filesystem. But when user grpc server, it need 50 minutes to process same capacity data. I wonder why use grpc so slow. I find if we follow the routine of what user uses hadoop mapreduce, mapper/reducer program only process one line data, framework will warp this data with hundreds of http request headers. It's expensive for grpc server to process fragmental data. Therefore, using grpc server may be more compatible to process a big chunk of data, or support more flexible user process program. It's not convenient to process fragmental data. I think limiting user program to process a file not a line data may solve this problem.

xiaoyunwu commented 9 years ago

Can you first double check what instance are you using and what is the network connect that they have?

Xiaoyun

On Wed, Jun 17, 2015 at 8:27 PM, PlutoShe notifications@github.com wrote:

After testing a example, I found a problem of current mapreduce implementation. The communication between user grpc server and framework is so slow. For example, 1G data only takes 30s to read and write in local filesystem. But when user grpc server, it need 50 minutes to process same capacity data. I wonder why use grpc so slow. I find if we follow the routine of what user uses hadoop mapreduce, mapper/reducer program only process one line data, framework will warp this data with hundreds of http request headers. It's expensive for grpc server to process fragmental data. Therefore, using grpc server may be more compatible to process a big chunk of data, or support more flexible user process program. It's not convenient to process fragmental data. I think limiting user program to process a file not a line data may solve this problem.

— Reply to this email directly or view it on GitHub https://github.com/taskgraph/mr/issues/2.

plutoshe commented 9 years ago

I just use local filesystem to exclude network influence

xiang90 commented 9 years ago

@plutoshe Can you write a small benchmark program I can look at? I can help you with this.

plutoshe commented 9 years ago

@xiang90 , could you give a example of benchmark program? I don't figure out how to deliver such program

xiang90 commented 9 years ago

@plutoshe https://github.com/grpc/grpc-go/blob/master/benchmark/benchmark.go

plutoshe commented 9 years ago

Anyway, it's my guess about the reason why the mapreduce is slow. When I finished this benchmark, I will refer again. Thanks.

xiang90 commented 9 years ago

@plutoshe Yea... Basically we want to see the fact, not the guess. :)

plutoshe commented 9 years ago

I control the factor, including/excluding the grpc server, it takes totally different time in my instance. Therefore I guess what is the reasonable explanation of this phenomenon

xiang90 commented 9 years ago

@plutoshe It might not be grpc's fault though.

plutoshe commented 9 years ago

@xiang90 I just think it is the elementary cost of using rpc.

xiang90 commented 9 years ago

@plutoshe Benchmark it first please.

plutoshe commented 9 years ago

@xiang90 Sure.

plutoshe commented 9 years ago

The input data imitate the fragmental data format. For small data, the performance like

BenchmarkForGenearl              1000      1328912 ns/op
BenchmarkForGRPC                 20        54108559 ns/op
BenchmarkForGRPCstream           100       22415750 ns/op

For adequate capacity data(5.6M) and bench time of 60s, the performance like

BenchmarkForGenearl               50    1817678089 ns/op
BenchmarkForGRPC                   3    26441051257 ns/op
BenchmarkForGRPCstream             3    24919309132 ns/op

Benchmark uses wordCount, details see here

xiaoyunwu commented 9 years ago

So, the streaming did not help?

In a separate thread, Xiang has suggested gogoproto, and zifei should have experience with it, and here is what he need to do:

All I need to do is change the test.pb.go with protoc --gofast_out=. test. proto? I don't see much difference.

I also changed import "github.com/golang/protobuf/proto" to import " github.com/gogo/protobuf/proto"

@5kg https://github.com/5kg You need to set https://github.com/coreos/etcd/blob/master/snap/snappb/snap.proto#L5-L8

I think encoding/decoding might be a issue.

Xiaoyun

On Fri, Jun 19, 2015 at 1:42 AM, PlutoShe notifications@github.com wrote:

The input data imitate the fragmental data format. For small, the performance like

BenchmarkForGenearl 1000 1328912 ns/op BenchmarkForGRPC 20 54108559 ns/op BenchmarkForGRPCstream 100 22415750 ns/op

For adequate capacity data(5.6M) and bench time of 20s, the performance like

BenchmarkForGenearl 20 1840577355 ns/op BenchmarkForGRPC 1 27999055697 ns/op BenchmarkForGRPCstream 1 25040286081 ns/op

Benchmark uses wordCound, details see here https://github.com/plutoshe/mr/tree/mr-new/benchmark

— Reply to this email directly or view it on GitHub https://github.com/taskgraph/mr/issues/2#issuecomment-113434369.

plutoshe commented 9 years ago

Sure, I will talk about this with zifei