Open dirkmc opened 3 years ago
@dirkmc I believe the root cause of this is pause/resume on the requestor side being implemented in a piss poor and unreliable way -- completely my fault, but I'm hoping you might do something for me to confirm: in https://github.com/filecoin-project/go-data-transfer/pull/152, can you remove the code that pauses the client, so it's just the provider pausing, and see if the problem goes away?
If so, we should have a conversation about the best way to implement pause resume on the requestor as its not obvious.
@hannahhoward I removed the pausing on the client side, and I was able to run the test 1,000 times in a row with no failures.
Could you explain what the issue is and what you're thinking in terms of a solution?
While running go-data-transfer tests I've noticed that graphsync seems to occasionally enqueue the same outgoing block twice for the same file. This appears to happen when the transfer is paused and then resumed.
This causes an issue with go-data-transfer because the provider calculates the queued bytes by summing the size of all queued blocks. However when a duplicate block is queued, the block doesn't actually get sent across the wire. So the provider may expect payment for a block that hasn't been sent.
The provider uses RegisterOutgoingBlockHook() to watch for enqueued blocks.
I've created a test in go-data-transfer to demonstrate the issue: https://github.com/filecoin-project/go-data-transfer/pull/152
In this case the transfer is paused after the second block is sent. When the transfer is resumed the second block is sent again. See the example below: