lukechampine / user

A CLI renter for Sia
MIT License
12 stars 2 forks source link

Connection timed out error causing other errors #6

Open grigzy28 opened 5 years ago

grigzy28 commented 5 years ago

I've noticed that when there is a connection timed out error it will cause all other good hosts to give the broken pipe error.

root@sia-test:~# user upload -m 10 fedora30.tar.xz
fedora30.tar.xz                                                                                                                                                         94%   44.17 MB   684.8 KB/s    
Upload failed: could not upload to some hosts:
43cd88ca: dial tcp 87.79.165.10:9982: connect: connection timed out
write tcp 192.168.1.4:54370->73.193.37.231:9978: write: broken pipe
write tcp 192.168.1.4:33496->63.155.9.70:9982: write: broken pipe
write tcp 192.168.1.4:39910->87.158.160.17:9982: write: broken pipe
write tcp 192.168.1.4:50618->96.227.220.184:9982: write: broken pipe
write tcp 192.168.1.4:60452->212.232.75.200:9982: write: broken pipe
write tcp 192.168.1.4:45590->79.114.65.81:9982: write: broken pipe

Is this normal behavior? I was able to successfully able to upload this file after removing the one timed out host.

lukechampine commented 5 years ago

I think it's "normal" in that it's a benign error, but it shouldn't be getting presented to the user like this.

I think what's happening is this. You successfully uploaded to all but one host, which timed out. The connections to the other hosts aren't closed until all of the uploads have finished, so while you were waiting on the slow host, those other connections were still open. But after 5 minutes of no activity, hosts will close their end of the connection (this is a DoS prevention measure). Then, once the bad host finally timed out, we send a "hangup" message to each host, indicating that we're closing the connection gracefully -- but since the hosts have already closed their end of the connection, we get a broken pipe error.

So in essence, broken pipe comes from us trying to send a "please close the connection" message to a host that has already closed the connection. That's why it's benign.

The obvious fix here is that if we try to send the "hangup" message and the connection is already closed, just ignore the error. I will implement this soon.

Thanks for the bug reports, by the way -- they are super helpful! :) I assume that other people have seen similar bugs, but did not report them, which is sad, because most of them are simple fixes and would save other users some frustration.