qubole / rubix

Cache File System optimized for columnar formats and object stores
Apache License 2.0
183 stars 74 forks source link

Inter-node reads for non local reads #21

Closed shubhamtagra closed 7 years ago

shubhamtagra commented 8 years ago

When a read is received at a node to which it is no local, it should find the owner node of the block and read it from that node. The owner node should read it from remote and cache it.

Reading from another node in same clusters seems to be faster than reading from s3. Initial thought was that with higher concurrency, s3 would turn out to be better than reads from other node but experiments show that it is not.

Experiment: Host a FileServer in node1 and access it from node2 to read different parts files.

This is for r3.large with enhanced networking:

clients| #files| request size| s3 time| fileServer time |

--------|-------------|-------------|----------|-----------------| 1 | 1 | 130MB | 3188 | 973 | 1 | 1 | 10MB | 1437 | 54 | 10 | 1 | 130MB| 17680 | 10390 | 10 | 1 | 10MB | 7537 | 219 | 10 | 5 | 130MB | 17765 | 10746 | 10 | 5 | 10MB | 7687 | 238 |

This is for r3.8xlarge with enhanced networking:

clients| #files| request size| s3 time| fileServer time |

--------|-------------|-------------|----------|-----------------| 1 | 1 | 130MB, full file | 2903 | 458 | 1 | 1 | 10MB | 1516 | 51 | 10 | 1 | 130MB, full file | 3370 | 1735 | 10 | 1 | 10MB | 1527 | 129 | 10 | 5 | 130MB each | 3464 | 1567 | 10 | 5 | 10MB | 1507 | 110 |