Closed lhx-x closed 6 years ago
@jdmaloney can you help identify bottlenecks? Also tagging Globus Python SDK developer @sirosen
Hi, not sure I can be of much help, but I'll do my best. It may also help to start an email thread with support@globus.org to try to track down the issue, with details like endpoint IDs and examples of tasks which ran slowly.
Before digging in too deeply, is this problem specific to Globus Transfers, or do tools like iperf
show you similar throughput?
If you can confirm that the problem is Globus-specific, it may help to try tuning how aggressively your transfers use the network. If the endpoint(s) are Managed (you have a Globus subscription and have flagged one or both of them as managed), you can modify the network_use
settings.
In the case of the SDK, you would call TransferClient.update_endpoint with a partial endpoint document to set some combination of network_use
, min_concurrency
, preferred_concurrency
, min_parallelism
, and preferred_parallelism
.
Unless you have very specific guesses about what's causing the slow transfer, stick to tuning network_use
to its various settings, and don't set it to custom
.
Thanks for the help!
When I upload those downloaded files to Google Cloud Storage, it can reach a speed > 100 MB/s. But I have not test the downloading side. Could you please let me know how to test it? Seems iperf needs to pair the server and client side.
Here I'm transferring data from terraref(I suspect it is a Globus subscription node?) to my own VM(which is not a subscription). So I think I can change neither of them.
@lhx-x I took a look at your current transfer that's running against the endpoint right now and I see the below: { "type": "GridFTP Transfer", "concurrency": 1, "protocol": "Mode S" }
with the sync_level = 3
While an easy way to ensure only the data you need moved in a partially synced directory is moved, sync mode is not the fastest way to move the data and the sync level being 3 (going up to checksums to determine if file is a sync candidate) won't make this fast across the 10's of thousands of files. If this is your first pull of the data especially, it's going to be good to not use sync mode and do a normal transfer with parallelism and pipelining. An example being right now the transfers between the Arizona site and NCSA are running at parallelism 4 and pipelining 10 and are close to saturating the 1Gb link (seeing ~110MB/s) at the AZ site. The NCSA side is not network bandwidth or file systems constrained, with multiple GB/s available for both those things.
I am not very familiar with the Globus Python SDK so @sirosen may be of more help in terms of configuration flags, etc. to allow your transfer to go wider.
If you'd like to run an iperf test though to see what the theoretical max is to your cloud instance let me know and I'll fire up an iperf listening server on the DTN that you could test against. Would be happy to do that too. We'd want to schedule a time though as I don't want to leave it up and listening indefinitely.
@jdmaloney Thanks JD! Yes -- the sync mode really matters. I change it to 0 and the speed doubled to 14 at once :) Let's wait and see for a while, to check if that's the current upper bound. We may schedule a iperf test after that.
@lhx-x Indeed it appears to be going faster, DTN node is reporting speeds to your cloud instance fairly consistent around 15MB/s now, which is indeed much improved. Up to nearly 1.2TB/day at that rate. The concurrency is also now running at 6, which should be helping you too.
Perhaps my expectations are too high, but 15M/s still sounds slow if max from laptop to VM is 100M/s. If you want to investigate further with Globus personnel, a support ticket with task and endpoint details remains the best path.
It does make sense that setting the sync level to 3 ("checksum") would slow things down. Most likely, the bottleneck in that case was checksumming on the VM. I would recommend setting the sync level to "exists" or "mtime" if you can. It tends to make scripted tasks faster/safer to rerun if they fail, and has typically negligible performance cost.
@sirosen Thanks Stephen! Here the 100M/s is from VM, uploading to GCS. Seems like the transferring speed from Globus is not stable: when I post this reply it is only 5M/s, with sync level = 0. I guess it would be helpful if we make a bandwidth test as @jdmaloney suggested.
@lhx-x any progress on this?
Sorry not yet. @jdmaloney do you mind if we start a bandwidth test on this?
@lhx-x Is there a time this afternoon or tomorrow we could schedule the test?
@jdmaloney sure! How about 2:00 pm PDT? Feel free to change it if that's not work for you. Also could you please post the commands we will need?
@lhx-x Sorry for this, was preparing for a conference out of town and ran out of time. This week is much more open, let me know what times work for you and we'll set this test up. For commands, you'll want to have the iperf package installed.
Command will be: iperf -c 141.142.169.14
Make sure you have port 5001 open on your host's firewall if it isn't open already. That's the default port for iperf. Once we have a time set up for the test, I'll make sure my host is listening over here.
@jdmaloney Thanks! No worry if you are not available at this time. I'll be ready in the next few days.
@lhx-x You still want to test this, my schedule is pretty open. If you still want to just shoot me a time.
Stale issue, closing. Please open a new issue if work remains.
DISCUSSION: is the expected downloading speed ~ 7MB/s?
Details
Hello!
I am transferring data from terraref data to a VM in Google Cloud, via Globus Python SDK. The downloading speed seems to be around 7M/s based on the Globus website. Could you please check if that's the expected transferring speed, or it should be much faster, but limited by some settings?
Thanks! Hongxiao
Completion Criteria
Know the upper bound of data transferring limit.