Open dgrisham opened 3 years ago
So, we do have to cancel a bunch of CIDs but that won't show up here.
We do re-broadcast our wantlist every 30 seconds, IIRC (to deal with a few connect/disconnect race conditions). Maybe we're not processing incoming data fast enough and re-sending want items for things we've already received?
Can you reproduce this with 2 nodes?
My suggestion would be to use local:docker
instead of local:exec
so you can add some latency between the peers. This way we can be sure that the seeders is not clogging "computationally" sending the data.
Also if you don't mind reproducing with 2 nodes or adding more seeders see what happens. My experience with local:docker
is that the results end up being pretty noisy because we lack the "network impact".
@Stebalien I was able to reproduce with two nodes after doubling the file size:
@adlrocha thanks for the suggestion! I'll try that out and report back -- takes a bit of work to switch to running in Docker, as I'm building off of a local version of go-bitswap
with changes to help me gather metrics.
On a bit of a hold with the Docker runs, figuring out how to get it working with my local repos: https://github.com/testground/testground/issues/1213#issue-836948026
Quick update on this -- I observed this behavior with the Docker containers as well, with network parameters that I believed to be reasonable (tested a few latencies from 10ms to 50ms, and if I recall correctly bandwidth of 150-1024 MiB/s). I do think a smaller proportion of the data was re-shared (assuming that's what was happening) than outside of the Docker containers. Unfortunately I don't have the plots at the moment and have been modifying my tests/plotting script a fair bit. This may become relevant in my tests again soon, so I'll post another update at that point.
I'm running
bitswap-transfer
tests and using themaster
branch ofgo-bitswap
as my baseline. However, I'm getting an unexpected jump in data exchanged once the actual file seems to have been successfully transferred.I'm using a star topology, 1 seeder connected to each of 3 leechers and the leechers are not connected to one another. The seeder simply uploads to all of the leechers, and while that's happening I record the Bitswap ledger of each leecher with this goroutine. Then I plot the data sent to each leecher from the seeder's perspective vs. time. The plot for one run looks like:
The horizontal green line is the size of the randomly generated file that the seeder sends to each leecher. It looks like the seeder successfully uploads to all of the leechers, but then a substantial amount of data (> 50% of the original file size) is sent to all of the leechers 15-20 seconds after the file has already been sent.
Does anyone have insight into what this might be, or if I'm perhaps misunderstanding the test/missing some other detail?
I'm using the
exec:go
builder andlocal:exec
runner, and the file size in this run was a little over 1GB. Aside from my background goroutine and setting up the star topology, my test should be about the same as thebitswap-transfer
test onmaster
in this repo.Let me know if I can provide more info! In case they might know something here, pinging @stebalien @whyrusleeping @daviddias @adlrocha @yiannisbot