vmware / govmomi

Go library for the VMware vSphere API
Apache License 2.0
2.3k stars 910 forks source link

[BUG] datastore.upload slow performance since v0.31.0 due to the http Client configuration changes #3564

Closed RPitt closed 1 day ago

RPitt commented 6 days ago

Describe the bug Changes in govmomi v0.31.0 result in poor HTTP upload performance for large files (such as ISOs) over high latency connections, e.g. to servers in a different location.

The performance degradation is extreme, in my scenario uploads with v0.30.7 run at 30+MB/sec whereas from v0.31.0 onwards runs at 0.5 MB/sec.

This seems to be a knock on from this change https://github.com/vmware/govmomi/pull/3161 which in turn resulted in the client getting a transport configuration of ForceAttemptHTTP2= true.

To Reproduce

  1. A distant (suggest 100ms latency) vCenter/ESXi is instance is required (could be simulated latency).
  2. Observe the performance uploading large file using versions 0.30.7 vs 0.31.0
    e.g. govc datastore.upload some-large-file /test-deleteme

Expected behavior Performance should be similar to what it was back on <= 0.30.7

Affected version Degradation introduced in v0.31.0 as part of https://github.com/vmware/govmomi/pull/3161

Potential fix Please consider adjusting the client transport settings using in govmomi, potentially disabling HTTP2 effectively restoring the settings as they were before.

I tried this via a hack (below) and it sped things up massively!

    if d, ok := http.DefaultTransport.(*http.Transport); ok {
        t = d.Clone()
        t.ForceAttemptHTTP2 = false
        t.TLSClientConfig = nil
    } else {

... though this probably requires more understanding.

github-actions[bot] commented 6 days ago

Howdy 🖐   RPitt ! Thank you for your interest in this project. We value your feedback and will respond soon.

If you want to contribute to this project, please make yourself familiar with the CONTRIBUTION guidelines.

RPitt commented 6 days ago

Note for others... Another way to get the lost performance back is to disable http2 by setting
GODEBUG=http2client=0 in the environment prior to running your process. Ref: https://go.dev/src/net/http/doc.go

dougm commented 6 days ago

Thank you @RPitt for digging into this. I can reproduce the problem and verify your workarounds. In my setup, using a ~200MB ova, transfer rate is ~2MB/sec by default and ~4.5MB/sec when disabling http2 with GODEBUG=http2client=0. Seeing the same behavior with govc datastore.upload, guest.upload and library.import. Do you want to open a PR with your patch (ForceAttemptHTTP2 = false) ? Still planning to look at the closer from the golang + vCenter side, but disabling http2 in govmomi/govc seems to be the best short term option.

dougm commented 5 days ago

I can also reproduce the http2 issue using curl with the script below, and curl throughput with http2 is worse than govc. Planning to open a bug internally, but disabling http2 in govmomi/govc by default is still the likely short term solution.

#!/bin/bash -e

host="$(govc env -x GOVC_URL_HOST)"
# session cookie we can use with curl
session="vmware_soap_session=$(govc session.login -l)"
# choose a shared datastore
export GOVC_DATASTORE=$(govc find / -type h | xargs govc datastore.info -H | grep Path: | awk '{print $2}' | head -n 1)
# derive datacenter from datastore
export GOVC_DATACENTER=$(govc find -p "$GOVC_DATASTORE" -type d)

govc datastore.mkdir -p images

if [ ! -e test.img ] ; then
  dd if=/dev/urandom of=test.img count=20 bs=10M
fi

dsName=$(basename "$GOVC_DATASTORE")

# curl --version: curl 8.7.1

# Changing '--http2' to '--http1.1' below cuts upload time by ~75%
time curl --http2 -k --verbose --cookie "$session" --upload-file test.img \
     "https://${host}/folder/images/test.img?dcPath=${GOVC_DATACENTER}&dsName=${dsName}" | cat
RPitt commented 2 days ago

Interesting, I'd initially presumed it this was entirely a golang library issue, though your confirming similar degradation with curl (written in C?) suggests there's a vCenter/ESXi aspect to it as well.

Anyway, yes it'd be great to get the quick workarround fix into govmomi.

Potentially I could even make/own said PR ... but I'll need to check my manager for approval.

Note when I was tinkering I had to set t.TLSClientConfig = nil as well to avoid an error, though I'd want to understand more about that and possible consequences as that might not be desired.

dougm commented 2 days ago

Thanks @RPitt , I can work on a PR today. I would like to avoid TLSClientConfig = nil and have an option to enable http2 at runtime (using http1.1 by default). May just revert to the previous code where we constructed http.Transport rather than using the Clone method.

As for the bug, right, I plan to open a bug against vCenter (unless I can find an existing one).

RPitt commented 2 days ago

Sounds good, yeah it's something about the the way the http.Transport is constructructed, reverting to the old mechanic sounds viable. Cheers 👍