pivotal-cf / om

General command line utility for working with VMware Tanzu Operations Manager
Apache License 2.0
134 stars 103 forks source link

Raise default connection/request timeouts #247

Closed aegershman closed 6 years ago

aegershman commented 6 years ago

Note: this is non-essential. Purely for convenience & kind of a greedy use-case. But nonetheless--

In order to account for slower environments, om could consider higher default values for connect-timeout (5s) and request-timeout (1800s).

Apply changes
attempting to apply changes to the targeted Ops Manager
could not execute "apply-changes": installation failed to trigger: could not make api request to installations endpoint: token could not be retrieved from target url: Post https://<URL_REDACTED>com/uaa/oauth/token: dial tcp: lookup <REDACTED>.com on <REDACTED_IP>:53: no such host
cf-gitbot commented 6 years ago

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.

jtarchie commented 6 years ago

Any recommendations?

aegershman commented 6 years ago

I completely lied on the request-timeout, I'm sorry. I suppose it's really only connect-timeout that we've ever seen any problems with. 1800s is 30 minutes.

We've found that a connect-timeout of 30s is quite sufficient.

jtarchie commented 6 years ago

This reflects the changes that were made to the cf CLI 2 years ago. I'd accept a PR for that change. /cc: @ljfranklin @mcwumbly ???

ljfranklin commented 6 years ago

@jtarchie where are you seeing the CF CLI defaulting to 30 seconds for connect timeout? I'm seeing 5 seconds: https://github.com/cloudfoundry/cli/blob/79ffcb19cd342262a4cad4fb25e4f4cd71f4b3e0/cf/net/gateway.go#L31.

I'm not necessary opposed to bumping the connect timeout to 30 seconds, although I worry it's mostly a bandaid over a busted network. If it takes over 5 seconds to complete the TLS handshake how long will it take to upload a 12GB PAS tile? I'd be curious to hear more about where @aegershman sees these errors, e.g. I'm uploading a tile to an OpsMgr located halfway around the world so latency is really high.

ljfranklin commented 6 years ago

Just realized om already allows you to config the dial timeout:

--connect-timeout, -o                  int     timeout in seconds to make TCP connections (default: 5)

Given that it's configurable if you know your network is slow and we recently added retries around networking errors, I'm in favor of closing this out without changing the default. Main reason is I don't want it to take 90 seconds (two retries by default) to find out you messed up your firewall rules to your OpsMgr VM.

mcwumbly commented 6 years ago

I agree @ljfranklin , and I think addressing this other issue will help for those with persistently busted networks: https://github.com/pivotal-cf/om/issues/245