Closed danny-wu closed 3 years ago
Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:
Finally, remember to use https://discuss.ipfs.io if you just need general support.
so one thing that is confusing is the that the message you are seeing is not coming from the send message code, it's coming from ReceiveError, which is triggered in [network/libp2_impl.go])https://github.com/ipfs/go-graphsync/blob/master/network/libp2p_impl.go#L145), which is part of the code to handle an INCOMING connection rather than an outgoing message. HandleNewStream should not be triggered when you open a connection on your side -- only when someone opens a connection to you (correct @raulk / @aarshkshah1992 ?)
Can I get some more background on where this came up? Were you receiving or sending piece data? I have in fact seen this issue come up before -- it happens when the incoming message is larger than network.MaxMessageSize -- however I thought we'd resolved it with code we have now in response sending (https://github.com/ipfs/go-graphsync/blob/master/responsemanager/peerresponsemanager/peerresponsesender.go#L368). One way this could get triggered is if the block data is particularly large -- but that probably shouldn't happen with default chunking strategies for filesystem data used by LOTUS.
I have received this error while the Dealbot was trying to make a storage deal with me.
Basically I was at the receiving end of the storage deal, and I think I was being sent the 128 MiB pieces. So receiving.
I, along with other users, have noticed a weird tendency for this to happen around 5120 'size' (bytes I presume?) received.
They are all receiving challenges:
miner@miner0:~$ lotus-miner data-transfers list
Sending Channels
ID Status Sending To Root Cid Initiated? Transferred Voucher
Receiving Channels
ID Status Receiving From Root Cid Initiated? Transferred Voucher
23579 Ongoing ...774A4Hyd ...lkvf4sfo N 2102362 ...r76jkc5q4zjof4iyj2lwgxig2a5ihuvmau"}}
24128 Ongoing ...774A4Hyd ...rets7dse N 5210 ...qzywc56y7i3mfqb6kwz6ifhmcwzdnqqzzq"}}
24685 Ongoing ...774A4Hyd ...vddz375c N 2102362 ...blvfrvzveqyb2ovfkmpmxb72sqkaidrgui"}}
25648 Ongoing ...774A4Hyd ...pfmkukvo N 12588122 ...ldtnj6pdsy4ykhbizpao3vd3sddmkn2ala"}}
26783 Ongoing ...774A4Hyd ...5myhxrpk N 5210 ...6iqmn6ibkw2dum6iaq7otyeoeq5ejtcxuu"}}
27922 Ongoing ...774A4Hyd ...mkjjsvh4 N 5210 ...chqwvfkwr4wtyjn4re5dnsh57gg37pstju"}}
28501 Ongoing ...774A4Hyd ...t4ax4qoi N 22025306 ...ygjcprcw2a4bozwx2b7b2y3etdrr3zlpbq"}}
29091 Ongoing ...774A4Hyd ...tww3u3je N 2102362 ...gcbuocdvrr542bztpkczydyht7irflzjy4"}}
All of them are stuck.
thanks. Ok, so seems like somehow the stream buffer is full when send/recovery happens.
follow-up: I'm a bit stumped
I wrote a benchmark to try to replicate the conditions you're encountering with the bot and I'm finding it hard to get whatever is happening to happen. https://github.com/ipfs/go-graphsync/pull/91
This sets up 20 simultenous transfers of 128MB files seperated in 1MB chunks similar to filecoin. We don't have the ability to test a real network connection, but just to be safe we can bring the whole mock connection down which is what I do in the middle of the test. It seems to survive that and complete successfully... leading me to believe we're now dealing with a problem that only manifests under a real network connection.
@raulk if you or anyone else on libp2p has any thoughts on what could be happening here, that'd be cool.
Last time I saw this issue, it was cause we were constructing messages that were too large -- 4MB or greater. There should be code in graphsync to immediately start a new message if your total size goes about 512K now. Reading an incoming message over 4MB will fail with a ShortBuffer error code from gogo-protobuf.
Does Libp2p have an internal message size limit? Filecoin chunks up unixFS files into fairly large blocks -- is there a scenario in which a 1MB block could break a message send? Should we consider lowering this? Is there any reason if the stream is reset and we reopen a connection that old data would still be in the stream?
Also though, to review: this isn't happening to all miners to my understanding, but somewhat randomly to some.
@hannahhoward The only thing that comes to my mind is that I have jumbo frames enabled (a MTU of 8192 specifically). My switch supports jumbo frames and this happens at the IP level, so I doubt it's causing anything here, but that's my networking differ from stock.
Would it help for me to try and capture a tcpdump
of this happening?
I'm going to try and get a complete tcpdump
of a stream reset and malformed length error happening; when dealbot is back online. I'll update in this ticket.
Just looking at this now. Thanks for the detailed report, @danny-wu.
@danny-wu a tcpdump won't help much here because all traffic is encoded. My guess is that the stream is being reset, but for some reason we keep using it. One hypothesis is that multiple threads could be writing to the same stream concurrently. That could definitely cause issues like that. Let me look.
Thanks so much @raulk; hope you can find the cause 🙏
Something interesting I find is that the stream resets tend to happen approximately 1 minute after the storage transfer begins. For example:
2020-09-06T10:33:53.664Z ^[[34mINFO^[[0m storagemarket_impl impl/provider.go:199 Handling storage deal proposal!
2020-09-06T10:33:53.904Z ^[[34mINFO^[[0m markets loggers/loggers.go:19 storage event {"name": "ProviderEventOpen", "proposal CID": "bafyreigoez74stgjzdkznnpoufhmceui2yjvqbpkpeknvr446oi3wlo3w4", "state": "StorageDealValidating", "message": ""}
2020-09-06T10:33:53.930Z ^[[34mINFO^[[0m markets loggers/loggers.go:19 storage event {"name": "ProviderEventDealDeciding", "proposal CID": "bafyreigoez74stgjzdkznnpoufhmceui2yjvqbpkpeknvr446oi3wlo3w4", "state": "StorageDealAcceptWait", "message": ""}
2020-09-06T10:33:53.968Z ^[[34mINFO^[[0m markets loggers/loggers.go:19 storage event {"name": "ProviderEventDataRequested", "proposal CID": "bafyreigoez74stgjzdkznnpoufhmceui2yjvqbpkpeknvr446oi3wlo3w4", "state": "StorageDealWaitingForData", "message": ""}
2020-09-06T10:33:54.216Z ^[[34mINFO^[[0m markets loggers/loggers.go:19 storage event {"name": "ProviderEventDataTransferInitiated", "proposal CID": "bafyreigoez74stgjzdkznnpoufhmceui2yjvqbpkpeknvr446oi3wlo3w4", "state": "StorageDealTransferring", "message": ""}
2020-09-06T10:34:57.815Z ^[[34mINFO^[[0m graphsync impl/graphsync.go:244 Graphsync ReceiveError: stream reset
...
2020-09-06T13:35:06.885Z ^[[34mINFO^[[0m storagemarket_impl impl/provider.go:199 Handling storage deal proposal!
2020-09-06T13:35:07.128Z ^[[34mINFO^[[0m markets loggers/loggers.go:19 storage event {"name": "ProviderEventOpen", "proposal CID": "bafyreif7o6bvtdkob6ntkm32r54thehlhghag2576xihfmhdushsua4aqm", "state": "StorageDealValidating", "message": ""}
2020-09-06T13:35:07.146Z ^[[34mINFO^[[0m markets loggers/loggers.go:19 storage event {"name": "ProviderEventDealDeciding", "proposal CID": "bafyreif7o6bvtdkob6ntkm32r54thehlhghag2576xihfmhdushsua4aqm", "state": "StorageDealAcceptWait", "message": ""}
2020-09-06T13:35:07.185Z ^[[34mINFO^[[0m markets loggers/loggers.go:19 storage event {"name": "ProviderEventDataRequested", "proposal CID": "bafyreif7o6bvtdkob6ntkm32r54thehlhghag2576xihfmhdushsua4aqm", "state": "StorageDealWaitingForData", "message": ""}
2020-09-06T13:35:07.722Z ^[[34mINFO^[[0m markets loggers/loggers.go:19 storage event {"name": "ProviderEventDataTransferInitiated", "proposal CID": "bafyreif7o6bvtdkob6ntkm32r54thehlhghag2576xihfmhdushsua4aqm", "state": "StorageDealTransferring", "message": ""}
2020-09-06T13:35:48.560Z ^[[34mINFO^[[0m graphsync impl/graphsync.go:244 Graphsync ReceiveError: stream reset
@hannahhoward Not sure if its related or not, but one of the miners has the following config is having t he same error
[Storage]
ParallelFetchLimit = 8
AllowAddPiece = false
AllowPreCommit1 = false
AllowPreCommit2 = false
AllowCommit = false
AllowUnseal = true
···
@raulk the whole messagequeue class should prevent multiple messages being written by multiple threads. Essentially, all messages outgoing to a single peer pipe through the message queue, which sends them sequentially in a single, isolated thread.
this might be caused by oversize messages; we use a 1MB limit pretty much everywhere in libp2p, so this might explain the issue.
Judging by where the first sign of an error appears (Graphsync ReceiveError: stream reset
), I don't think this is related to oversized protocol messages. We are initializing a gogo length-delimited protobuf reader with a buffer of 4mb, on the receiver side. But this first fails on the receiver side with a "stream reset", not a "short buffer" error, which is what we'd expect to see if the root cause was an oversized message.
If the first error we observe is a "stream reset", it's probably a yamux stream RST sent by the over end. I'm digging into the places where we explicitly reset the stream on the sender side, to try to get to the bottom of this.
@danny-wu I'd like to take a look at the logs on the dealbot side (sender). Mind posting your miner ID, or slacking it to me via DM on the Filecoin slack?
@hannahhoward I think I may have found the culprit here. I can't claim I understand the nitty-gritty details of go-graphsync, so pardon me if I get something wrong here.
It looks like chunking is happening on a higher level, at the "response builder" level.
If the sum of the block size of the previous response and the incoming one is greater than 512kb, we start another response builder, instead of appending the response.
At one point, we call sendResponseMessages
, which ends up looping over all response builders and calling SendResponse
on each:
This goes into the PeerMessageManager:
Which calls AddResponses
for each individual response.
Here is where the issue is:
Each call to AddResponse
ends up coalescing the response into a single serializable response, which probably blows up the protobuf message over 4mb.
When the dealbot tries to write the message on the wire, it succeeds. But as soon as @danny-wu's miner receives the length-prefix of the protobuf message, it notices it's over 4mb (the configured limit for the gogo protobuf reader), and it resets the stream.
I think this situation is likely to happen when congestion control has kicked in on the receiver side (@danny-wu's miner; have you been monitoring your bandwidth utilisation?). So a message is being actively written on the wire, but a bunch of queries are coming in and queuing up more data, all of which eventually ends up in one big fat protobuf which fails to write.
P.S.: There's quite a bit of indirection and abstraction in go-graphsync, which we may be able to simplify going forward; I found this codebase was very hard to follow!
@raulk oy. you're right what fail. will fix tomorrow.
@raulk follow-up: I'm not sure so sure.
If you look at the loop here:
note the last 4 lines:
// wait for message to be processed
select {
case <-done:
case <-prs.ctx.Done():
}
I now remember writing this part specifically to prevent the problem you're talking about. Done won't read until the actual message is sent on the wire.
Obviously, experimentally, something is wrong here, so maybe we need to investigate further.
And I agree, the message coalescing part of the message queue at this point really isn't doing much other than coalescing requests and probably could be simplified considerably.
@hannahhoward Right! There's something weird here, though. I think the assumption that this channel fires when the message is written on the wire is incorrect.
Take a look at these lines. We close the channel as soon as we pick the message to send, not when it is actually sent on the wire (that happens later).
The loop you point out to will iterate before the message is written on the wire. I don't know if this is the problem or not, but worth looking deeper into.
Sorry it's not so much that it's on the wire, as that it's removed from adding more things to it. See immediately preceeding lines:
Hi Raul, my ID is t014365
I have 5 machines. 1 runs the lotus node and the lotus miner, the rest are workers. These are connected over a gigabit TL-SG108 switch; which is then plugged into my router.
During mining, sometimes large (~500GB) PC1 files get transferred from worker to worker. Thus, the switch is maxed out, and I can imagine congestion control kicks in.
Anecdotally, recent stream resets have seemed to coincide with when a transfer is happening, and network I/O is being maxed out.
I also wonder if ping/latency is a component here. Anecdotally, I've noticed reports of this from Oceania and China more, and we tend to have relatively high latencies to where dealbot is hosted.
Closing as out of date and I believe addressed by several improvements made since first filed.
Issue
I am encountering stuck filecoin deal transfers, as well as I/O timeouts from dealbot. Upon further investigation, it looks like GraphSync's
attemptSendAndRecovery
's resends have malformed lengths and continuously fail.See below for the miner log. Note that
Graphsync ReceiveError: stream reset
error followed by exactly 20Graphsync ReceiveError: short buffer
s.Given that
sendMessage
will issue a reset, and then attempt to retry 10 times (a multiple of 20),, it looks like a condition has caused incorrect buffer lengths to be repeatedly sent to the receiver for both messages. The retries all fail due to this error, and the connection is terminated.Please note the timestamps -- the errors span 1 minute and 40 seconds.
Ruling out my internet connection
I have set up a remote connectivity tester, which opens a socket to the miner my port every minute. Logs are available here: http://139.180.162.246/readlog.php
Given that I am able to connect every minute and the
short buffer
errors span more than 1 minute, I think I can rule out my internet connection being down or messing up for the entire period of 1 min 40 seconds.Furthermore, while anecdotal, I have no issues transferring terabyte files to the public internet over SCP, over days, without any interruptions, broken pipes, or resets.
My continuous SSH connections, from my laptop to the miner machine, have not been broken for more than 72 hours since I opened it.