storj / uplink

Storj network Go library
MIT License
115 stars 18 forks source link

Investigate intermittent upload failures with new upload codepath #147

Closed shaupt131 closed 9 months ago

shaupt131 commented 1 year ago

Background:

@littleskunk reported having problems finishing his uploads with the new uplink.

uplink: encryption: metaclient: manager closed: read tcp 192.168.234.11:51157->104.198.144.129:7777: wsarecv: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

Additional discussion: https://storj.slack.com/archives/CH6LU8Y64/p1683286505535209

Acceptance Criteria:

-Complete investigation into root cause -If low effort (2 or less) implement a fix under this ticket. If not, create follow-up ticket(s) accordingly.

iglesiasbrandon commented 1 year ago

@zeebo, any progress with Th3Van on this?

https://forum.storj.io/t/upload-refactoring/22559/28

Th3Van commented 1 year ago

@iglesiasbrandon - He's working on it afaik - I gave @zeebo shell access to the sandbox I'm using to test Uplink - and hes been on the box yesterday and did some upload testing. What he did find out by testing, I do not know yet.

Th3Van.dk

zeebo commented 1 year ago

Thanks for the shell access. It's been super helpful.

One thing I've found so far is that the new upload code path calls the BeginSegment API with no concurrency, so the upload speed is limited by 64MB/avg latency of that API call. I was able to get better results by doing some buffering of the input data stream which allowed those calls to happen in parallel, but still not as good as the old uplink version.

It explains why I was able to keep a large number of piece uploads at 1gbit but not at 10gbit: if your pipe is big enough, you can't call BeginSegment fast enough to keep up!

So, progress. I'm continuing to dig to figure out if there's other bits of tuning and thinking of more ways to fix it (maybe buffering some BeginSegment calls, gonna check connection pooling, etc.)

storjrobot commented 1 year ago

This issue has been mentioned on Storj Community Forum (official). There might be relevant details there:

https://forum.storj.io/t/upload-refactoring/22559/30

shaupt131 commented 12 months ago

https://review.dev.storj.io/c/storj/uplink/+/10730

shaupt131 commented 11 months ago

Looks like the only other PR is ready to submit. Any other outstanding tasks here?

iglesiasbrandon commented 11 months ago

@jtolio ^

jtolio commented 11 months ago

it's merged, just needs a release now

iglesiasbrandon commented 9 months ago

i think we can mark this as done! https://github.com/storj/uplink/releases/tag/v1.12.0 @jtolio @shaupt131