hjmangalam / parsyncfp

follow-on to parsync (parallel rsync) with better startup perf
Other
161 stars 19 forks source link

Not an issue, but a FYI #41

Open fangchin opened 3 years ago

fangchin commented 3 years ago

Hi all,

Please review this HPCwire's "off the wire" article: DOE Technical Report: *When to Use rsync? March 25, 2021 https://bit.ly/2OZqKV7

Regards

hjmangalam commented 3 years ago

Hi Chin, Thanks for this note. I'll be reading the paper and responding in detail. If you'd like the conversation to continue on github, do nothing. If you'd like to continue it in private, my email is widely available. As it turns out, I'm working on the multihost version right now and I hope to push it to github in a week or two. I'm surprised you didn't include fpsync, a similar rsync wrapper by Ganael LaPlanche which supports multihosts already (and who wrote the fpart file chunker that parsyncfp uses to allow transport to start before the full file recursion is done.) Best wishes, Harry

On Thu, Mar 25, 2021 at 1:16 PM Chin Fang @.***> wrote:

Hi all,

Please review this HPCwire https://www.hpcwire.com's "off the wire" article: DOE Technical Report: *When to Use rsync? March 25, 2021 https://bit.ly/2OZqKV7

Regards

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hjmangalam/parsyncfp/issues/41, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASF3Y7DKJSZJCLQCRWXDBTTFOKYHANCNFSM4Z2BWOWA .

--

Harry Mangalam

fangchin commented 3 years ago

Hi Harry,

Thanks for this note. I'll be reading the paper and responding in detail. If you'd like the conversation to continue on github, do nothing. If you'd like to continue it in private, my email is widely available.

Thanks for responding. Happy to continue the discussions right here on the github.

First of all, please let me note that as we pointed out in our report Test environment, p. 4 that we had had very tight time for the investigation and highly constrained access to the two employed testbeds - there are other projects waiting for them. Nevertheless, the methodology is precisely described in Test methodology, p. 4 ; the testers are freely available to the public https://github.com/fangchin/test_rsync; and we are confident about the rigorousness, comprehensiveness, automated testing, and fairness employed for the investigation.

As it turns out, I'm working on the multihost version right now and I hope to push it to github in a week or two.

It's our view that any multi-host application must show the linear scalability efficiency defined in the report A glance at two PDDMs, p. 14. Also, by "multi-host", did you mean "scale-out" (i.e. multi-node cluster)? If so, then HA, auto load sharing etc. among multiple instances running on different cluster nodes should be intrinsic. We do hope the our work spurs similar discussions and investigations for other data movers.

I'm surprised you didn't include fpsync, a similar rsync wrapper by Ganael LaPlanche which supports multihosts already (and who wrote the fpart file chunker that parsyncfp uses to allow transport to start before the full file recursion is done.)

I am afraid that a different rsync "wrappers" cannot change the intrinsic limitations of rsync in tackling LOSF, really large files (e.g. hundreds of GBs, multiple TBs), and large RTT values.

In addition, a monograph usually focuses on a single subject. So as the title of the report indicates, it focuses on rsync and a single selected rsync-based tool like parsyncfp (we didn't even have time to evaluate rsync-ssl!). As you alluded, it would be great to include more, other than fpsync, bbcp would be a good one to evaluate for example. Nevertheless, trying as best as we can, we only have 24 hours/day and we have other businesses to take care of :)

Best Regards,

Chin Fang, Zettar Inc.