terraref / computing-pipeline

Pipeline to Extract Plant Phenotypes from Reference Data
BSD 3-Clause "New" or "Revised" License
23 stars 13 forks source link

Data transfer speed #342

Closed lhx-x closed 6 years ago

lhx-x commented 7 years ago

DISCUSSION: is the expected downloading speed ~ 7MB/s?

Details

Hello!

I am transferring data from terraref data to a VM in Google Cloud, via Globus Python SDK. The downloading speed seems to be around 7M/s based on the Globus website. Could you please check if that's the expected transferring speed, or it should be much faster, but limited by some settings?

Thanks! Hongxiao

Completion Criteria

Know the upper bound of data transferring limit.

dlebauer commented 7 years ago

@jdmaloney can you help identify bottlenecks? Also tagging Globus Python SDK developer @sirosen

sirosen commented 7 years ago

Hi, not sure I can be of much help, but I'll do my best. It may also help to start an email thread with support@globus.org to try to track down the issue, with details like endpoint IDs and examples of tasks which ran slowly.

Before digging in too deeply, is this problem specific to Globus Transfers, or do tools like iperf show you similar throughput?

If you can confirm that the problem is Globus-specific, it may help to try tuning how aggressively your transfers use the network. If the endpoint(s) are Managed (you have a Globus subscription and have flagged one or both of them as managed), you can modify the network_use settings. In the case of the SDK, you would call TransferClient.update_endpoint with a partial endpoint document to set some combination of network_use, min_concurrency, preferred_concurrency, min_parallelism, and preferred_parallelism. Unless you have very specific guesses about what's causing the slow transfer, stick to tuning network_use to its various settings, and don't set it to custom.

lhx-x commented 7 years ago

Thanks for the help!

When I upload those downloaded files to Google Cloud Storage, it can reach a speed > 100 MB/s. But I have not test the downloading side. Could you please let me know how to test it? Seems iperf needs to pair the server and client side.

Here I'm transferring data from terraref(I suspect it is a Globus subscription node?) to my own VM(which is not a subscription). So I think I can change neither of them.

jdmaloney commented 7 years ago

@lhx-x I took a look at your current transfer that's running against the endpoint right now and I see the below: { "type": "GridFTP Transfer", "concurrency": 1, "protocol": "Mode S" }

with the sync_level = 3

While an easy way to ensure only the data you need moved in a partially synced directory is moved, sync mode is not the fastest way to move the data and the sync level being 3 (going up to checksums to determine if file is a sync candidate) won't make this fast across the 10's of thousands of files. If this is your first pull of the data especially, it's going to be good to not use sync mode and do a normal transfer with parallelism and pipelining. An example being right now the transfers between the Arizona site and NCSA are running at parallelism 4 and pipelining 10 and are close to saturating the 1Gb link (seeing ~110MB/s) at the AZ site. The NCSA side is not network bandwidth or file systems constrained, with multiple GB/s available for both those things.

I am not very familiar with the Globus Python SDK so @sirosen may be of more help in terms of configuration flags, etc. to allow your transfer to go wider.

If you'd like to run an iperf test though to see what the theoretical max is to your cloud instance let me know and I'll fire up an iperf listening server on the DTN that you could test against. Would be happy to do that too. We'd want to schedule a time though as I don't want to leave it up and listening indefinitely.

lhx-x commented 7 years ago

@jdmaloney Thanks JD! Yes -- the sync mode really matters. I change it to 0 and the speed doubled to 14 at once :) Let's wait and see for a while, to check if that's the current upper bound. We may schedule a iperf test after that.

jdmaloney commented 7 years ago

@lhx-x Indeed it appears to be going faster, DTN node is reporting speeds to your cloud instance fairly consistent around 15MB/s now, which is indeed much improved. Up to nearly 1.2TB/day at that rate. The concurrency is also now running at 6, which should be helping you too.

sirosen commented 7 years ago

Perhaps my expectations are too high, but 15M/s still sounds slow if max from laptop to VM is 100M/s. If you want to investigate further with Globus personnel, a support ticket with task and endpoint details remains the best path.

It does make sense that setting the sync level to 3 ("checksum") would slow things down. Most likely, the bottleneck in that case was checksumming on the VM. I would recommend setting the sync level to "exists" or "mtime" if you can. It tends to make scripted tasks faster/safer to rerun if they fail, and has typically negligible performance cost.

lhx-x commented 7 years ago

@sirosen Thanks Stephen! Here the 100M/s is from VM, uploading to GCS. Seems like the transferring speed from Globus is not stable: when I post this reply it is only 5M/s, with sync level = 0. I guess it would be helpful if we make a bandwidth test as @jdmaloney suggested.

dlebauer commented 7 years ago

@lhx-x any progress on this?

lhx-x commented 7 years ago

Sorry not yet. @jdmaloney do you mind if we start a bandwidth test on this?

jdmaloney commented 7 years ago

@lhx-x Is there a time this afternoon or tomorrow we could schedule the test?

lhx-x commented 7 years ago

@jdmaloney sure! How about 2:00 pm PDT? Feel free to change it if that's not work for you. Also could you please post the commands we will need?

jdmaloney commented 7 years ago

@lhx-x Sorry for this, was preparing for a conference out of town and ran out of time. This week is much more open, let me know what times work for you and we'll set this test up. For commands, you'll want to have the iperf package installed.

Command will be: iperf -c 141.142.169.14

Make sure you have port 5001 open on your host's firewall if it isn't open already. That's the default port for iperf. Once we have a time set up for the test, I'll make sure my host is listening over here.

lhx-x commented 7 years ago

@jdmaloney Thanks! No worry if you are not available at this time. I'll be ready in the next few days.

jdmaloney commented 7 years ago

@lhx-x You still want to test this, my schedule is pretty open. If you still want to just shoot me a time.

craig-willis commented 6 years ago

Stale issue, closing. Please open a new issue if work remains.