masterzen / winrm

Command-line tool and library for Windows remote command execution in Go
Apache License 2.0
425 stars 129 forks source link

Reading the output of long running commands gives net/http: timeout #147

Closed kke closed 11 months ago

kke commented 11 months ago

If you execute a long running command that doesn't output for a while and keep reading the stdout/stderr, you get:

unknown error Post "https://127.0.0.1:5986/wsman": net/http: timeout awaiting response headers"

I suppose this bit at command.go is supposed to catch that:

    response, err := c.client.sendRequest(request)
    if err != nil {
        if strings.Contains(err.Error(), "OperationTimeout") {
            // Operation timeout because there was no command output
            return false, err
        }
        if strings.Contains(err.Error(), "EOF") {
            c.exitCode = 16001
        }

        c.Stderr.write.CloseWithError(err)
        c.Stdout.write.CloseWithError(err)
        return true, err
    }

But as we can see in the error message, the matcher against OperationTimeout does not seem to be correct, maybe it once was.

masterzen commented 11 months ago

After how long do you get the timeout? I'm curious to see where that timeout is coming from (ie client settings, winrm http server)?

But as we can see in the error message, the matcher against OperationTimeout does not seem to be correct, maybe it once was.

The OperationTimeout is the timeout coming from the winrm handler on the server for the given command. This timeout is controlled by the Timeout parameter of Parameters (see https://github.com/masterzen/winrm/blob/807053a1bcb92228b78594f11403cb140399ea81/parameters.go#L17 for the default which is 1min).

kke commented 11 months ago

Those seem to go to the soap message, not to http request. The OperationTimeout matcher matches the error in a soap response.

The timeouts for http are set in http.go:

func (c *clientRequest) Transport(endpoint *Endpoint) error {
    dial := (&net.Dialer{
        Timeout:   30 * time.Second,
        KeepAlive: 30 * time.Second,
    }).Dial
        ...
    transport := &http.Transport{
        Dial:                  dial,
        ResponseHeaderTimeout: endpoint.Timeout,
    }
    return nil
}

I'm using an endpoint like this:

    endpoint := &winrm.Endpoint{
          ...
          Timeout: time.Minute,
    }
        params := winrm.DefaultParameters
        client, err := winrm.NewClientWithParameters(endpoint, c.User, c.Password, params)

It seems the timeouts happen around 1m5.867503744s.

kke commented 11 months ago

I'm not getting any timeouts for other requests, just from the output readers.

kke commented 11 months ago

Maybe having the http timeout and the soap operation timeout values both at 1 minute is not optimal, maybe the http timeout should be a bit longer to give the soap thing time to give an error.

In any case, I think the output slurpers should probably treat the http timeout the same way as soap OperationTimeout (which I assume is to retry)

masterzen commented 11 months ago

Maybe having the http timeout and the soap operation timeout values both at 1 minute is not optimal, maybe the http timeout should be a bit longer to give the soap thing time to give an error.

I'm wondering what would happen if your http timeout is smaller than the operation timeout, or if it's longer. It also depends on what is set in the winrm config at the remote machine (see https://learn.microsoft.com/en-us/windows/win32/winrm/installation-and-configuration-for-windows-remote-management#idletimeout and the MaxTimeoutms which control what value can be set for the OperationTimeout).

In any case, I think the output slurpers should probably treat the http timeout the same way as soap OperationTimeout (which I assume is to retry)

Yes, I concur, in case of http timeout we definitely should retry (which is what your PR will do anyway).