protocol / beyond-bitswap

Other
34 stars 9 forks source link

Bitswap Transfer test: Unexpected additional data #44

Open dgrisham opened 3 years ago

dgrisham commented 3 years ago

I'm running bitswap-transfer tests and using the master branch of go-bitswap as my baseline. However, I'm getting an unexpected jump in data exchanged once the actual file seems to have been successfully transferred.

I'm using a star topology, 1 seeder connected to each of 3 leechers and the leechers are not connected to one another. The seeder simply uploads to all of the leechers, and while that's happening I record the Bitswap ledger of each leecher with this goroutine. Then I plot the data sent to each leecher from the seeder's perspective vs. time. The plot for one run looks like:

image

The horizontal green line is the size of the randomly generated file that the seeder sends to each leecher. It looks like the seeder successfully uploads to all of the leechers, but then a substantial amount of data (> 50% of the original file size) is sent to all of the leechers 15-20 seconds after the file has already been sent.

Does anyone have insight into what this might be, or if I'm perhaps misunderstanding the test/missing some other detail?

I'm using the exec:go builder and local:exec runner, and the file size in this run was a little over 1GB. Aside from my background goroutine and setting up the star topology, my test should be about the same as the bitswap-transfer test on master in this repo.

Let me know if I can provide more info! In case they might know something here, pinging @stebalien @whyrusleeping @daviddias @adlrocha @yiannisbot

Stebalien commented 3 years ago

So, we do have to cancel a bunch of CIDs but that won't show up here.

We do re-broadcast our wantlist every 30 seconds, IIRC (to deal with a few connect/disconnect race conditions). Maybe we're not processing incoming data fast enough and re-sending want items for things we've already received?

Can you reproduce this with 2 nodes?

adlrocha commented 3 years ago

My suggestion would be to use local:docker instead of local:exec so you can add some latency between the peers. This way we can be sure that the seeders is not clogging "computationally" sending the data.

Also if you don't mind reproducing with 2 nodes or adding more seeders see what happens. My experience with local:docker is that the results end up being pretty noisy because we lack the "network impact".

dgrisham commented 3 years ago

@Stebalien I was able to reproduce with two nodes after doubling the file size:

master-c1a2g16o0u3v17bdt0o0-2-nodes

@adlrocha thanks for the suggestion! I'll try that out and report back -- takes a bit of work to switch to running in Docker, as I'm building off of a local version of go-bitswap with changes to help me gather metrics.

dgrisham commented 3 years ago

On a bit of a hold with the Docker runs, figuring out how to get it working with my local repos: https://github.com/testground/testground/issues/1213#issue-836948026

dgrisham commented 3 years ago

Quick update on this -- I observed this behavior with the Docker containers as well, with network parameters that I believed to be reasonable (tested a few latencies from 10ms to 50ms, and if I recall correctly bandwidth of 150-1024 MiB/s). I do think a smaller proportion of the data was re-shared (assuming that's what was happening) than outside of the Docker containers. Unfortunately I don't have the plots at the moment and have been modifying my tests/plotting script a fair bit. This may become relevant in my tests again soon, so I'll post another update at that point.