Open lovesyk opened 8 years ago
I mostly wish for the multifetching. i.e. with a butch mechanism on the client, one bucket per chunkserver. It can start requesting from 4 servers, and if it sees that one performs at a lower rate, adjust how much it fetches from that one. the rate can be easily tracked continuously, and it would allow to safely fetch parallel etc. (parallel. not just round-robin. really parallel. ;-)
Let's say we have 3 chunkservers. One on a GBit connection, one on 500 MBit and one on 100 MBit. There is no replication set up and only one client is constantly consuming data at 150 MBit/s. Currently, chunks are being fetched sequentially as they are requested. This means that the 100 MBit server will be overloaded once its turn comes up and the client will slow down to 100 Mbit/s until the current chunk is done. After that, the 100 MBit server is going to idle until the next two chunks have been consumed from the other servers, then get overloaded again. Wouldn't it be great if the LizardFS client was more intelligent and recognized that:
Basically, the client should be trying to utilize remaining client bandwidth to speed up upcoming requests and decrease speed spikes as much as possible. An additional feature to achieve this would be to fetch different parts of the same chunk from different chunkservers when it is preferrable to prefetching like on the very first chunk of a file.