itchio / butler

🎩 Command-line itch.io helper
MIT License
745 stars 52 forks source link

Backend problem: GCS occasionally closes connection immediately? #167

Closed fasterthanlime closed 5 years ago

fasterthanlime commented 5 years ago

While processing a build:

Get (snipped): EOF
git.itch.ovh/itchio/wharfd/vendor/github.com/itchio/wharf/pools/cachepool.(*CachePool).Preload
    /home/amos/goprod/src/git.itch.ovh/itchio/wharfd/vendor/github.com/itchio/wharf/pools/cachepool/cachepool.go:73
git.itch.ovh/itchio/wharfd/jobs.DoApplyTask.func2
    /home/amos/goprod/src/git.itch.ovh/itchio/wharfd/jobs/task_apply.go:197
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:2361
Could not preload file 796 (res/music/FM1_nodrums.wav)
fasterthanlime commented 5 years ago

Also happened on initial request when applying a patch:

Get (snipped): EOF
htfs.Open (initial request)
opening remote signature
fasterthanlime commented 5 years ago

The thing is that EOF is a normal/expected error in some cases, just not when opening a file? I'd have to investigate more.

fasterthanlime commented 5 years ago

little update on this, it appears the default http client gives us a connection reset by peer error:

func main() {
    url := "http://api.localhost.com:8080/games/28/uploads"

    _, err := http.Get(url)
    log.Printf("err = %+v", err)

    c := timeout.NewDefaultClient()
    req, err := http.NewRequest("GET", url, nil)
    must(err)

    _, err = c.Do(req)
    log.Printf("err = %+v", err)
}

Output:

☁  dialtest  go run main.go                       
2018/11/30 19:49:33 err = Get http://api.localhost.com:8080/games/28/uploads: read tcp 192.168.1.29:39228->192.168.1.28:8080: read: connection reset by peer
2018/11/30 19:49:33 err = Get http://api.localhost.com:8080/games/28/uploads: EOF
fasterthanlime commented 5 years ago

Ok it appears it's not idletiming's fault:

☁  dialtest  go get && dialtest
======================
default: 
Get http://localhost.com:8080: read tcp 192.168.1.29:49146->192.168.1.28:8080: read: connection reset by peer
Get http://localhost.com:8080: read tcp 192.168.1.29:49148->192.168.1.28:8080: read: connection reset by peer
Get http://localhost.com:8080: EOF
Get http://localhost.com:8080: read tcp 192.168.1.29:49152->192.168.1.28:8080: read: connection reset by peer
Get http://localhost.com:8080: read tcp 192.168.1.29:49154->192.168.1.28:8080: read: connection reset by peer
Get http://localhost.com:8080: read tcp 192.168.1.29:49156->192.168.1.28:8080: read: connection reset by peer
Get http://localhost.com:8080: EOF
Get http://localhost.com:8080: EOF
Get http://localhost.com:8080: EOF
Get http://localhost.com:8080: EOF
======================
timeout: 
Get http://localhost.com:8080: read tcp 192.168.1.29:49166->192.168.1.28:8080: read: connection reset by peer
Get http://localhost.com:8080: read tcp 192.168.1.29:49168->192.168.1.28:8080: read: connection reset by peer
Get http://localhost.com:8080: EOF
Get http://localhost.com:8080: read tcp 192.168.1.29:49172->192.168.1.28:8080: read: connection reset by peer
Get http://localhost.com:8080: read tcp 192.168.1.29:49174->192.168.1.28:8080: read: connection reset by peer
Get http://localhost.com:8080: EOF
Get http://localhost.com:8080: read tcp 192.168.1.29:49178->192.168.1.28:8080: read: connection reset by peer
Get http://localhost.com:8080: read tcp 192.168.1.29:49180->192.168.1.28:8080: read: connection reset by peer
Get http://localhost.com:8080: read tcp 192.168.1.29:49182->192.168.1.28:8080: read: connection reset by peer
Get http://localhost.com:8080: read tcp 192.168.1.29:49184->192.168.1.28:8080: read: connection reset by peer
fasterthanlime commented 5 years ago

Running butler cp on that url a couple times is interesting, connection reset is definitely detected as a network error, but EOF isn't, so it either fails immediately or retries:

image

fasterthanlime commented 5 years ago

After the change, EOF is retried in every case, except when we reach the end of content-length when reading a file:

image