openskope / skope-deployment

Everything needed to deploy SKOPE in a new environment.
0 stars 1 forks source link

Benchmark data transfer rate between Nebula and Jetstream #43

Closed tmcphillips closed 6 years ago

tmcphillips commented 6 years ago

We will be running SKOPE services on both Nebula and Jetstream, and the processed data sets served by SKOPE should be mirrored on file servers in each environment. Because we likely will be doing the bulk of data pre-processing on Nebula, we need to know how fast we can synchronize new and updated data sets from Nebula to Jetstream to help us schedule new releases. We also need to know what the optimal settings are for rsync in this setting.

tmcphillips commented 6 years ago

I repeatedly timed copying a 9 GB file from staging on Nebula, to tacc-staging on Jetstream using rsync.

Using the options -avz, which compresses the data stream before transmitting it, the transfer rate is roughly 9 MBytes/sec.

Using the options -av, which does not compress the data stream, the transfer rate is 59 MBytes/sec sustained, or more than 6 times faster than when using compression.

Looking at CPU usage on both ends, the explanation of this difference is that one VCPU on the node on Nebula is pegged at 100% by the rsync process for the duration of the transfer when doing compression, but only 25% of a Nebula VCPU is used when not doing compression. It thus appears that the CPU on the source side limits file transfer speed if compression is requested.

For synchronizing data files from Nebula to Jetstream the optimal rsync options are -av.