libp2p / go-libp2p-pubsub

The PubSub implementation for go-libp2p
https://github.com/libp2p/specs/tree/master/pubsub
Other
325 stars 187 forks source link

When publishing a lot of messages I get "dropping message to peer" log info #152

Open iulianpascalau opened 5 years ago

iulianpascalau commented 5 years ago

Hello

I am trying to send around 50000 - 100000 messages from peer 1 to peer 2, each message is around 300 bytes as fast as libp2p's stack can handle (no time.sleep between each send) and sometimes I get

INFO     pubsub: dropping message to peer <peer.ID 16*LBEBuX>: queue full gossipsub.go:348

and sometimes I get:

pubsub: Can't deliver message to subscription for topic tx; subscriber too slow pubsub.go:493

Since there is no feedback when the message has been sent (or not) in my publish loop I do not know how much I have to wait till the next publish. Is there a good practice advice to accomplish this task?

Thanks

vyzo commented 5 years ago

It would be quite unwieldy to provide feedback on this from the library. When is the message considered sent? When it has entered the main loop? When it has been sent to one peer? When it has been sent to all peers? And how do we deal with slow peers?

If you are looking to saturate the network my advice would be to add an operation that yields to the scheduler between messages; a tiny delay (say 1us) could very well work.

iulianpascalau commented 5 years ago

I know it is awkward to have some sort of callback func for publishing. Right now, the message is considered sent when the message object is being fetched from publish chan in the main loop (so another message can be inserted) but this might cause the message to be dropped for each connected peer's output chan. In this case, 1us delay should solve the problem but it doesn't look too good for me. In my high performance LAN might be fine 1us but in a WAN might prove that is insufficient and one message might be lost forever not because of a network broken connection but for the fact that I have accidentally called to often publish func.
Anyway, thanks for the feedback. I will try to figure out a solution on the protocol that uses libp2p stack to resend the messages if the node "thinks" they are lost.

iulianpascalau commented 4 years ago

We found a work-around for this problem: instead of sending a large number of small messages we created a sort of accumulator that gathers around 256k of data and send one such large message. It proved to be a very good solution that works well between multiple nodes across the globe. the time.Sleep value between publishing a new message is, in our case, 10 micro seconds.

flcl42 commented 5 months ago

@vyzo I'm experiencing something similar, after some time receiver gets a part of a pubsub message, and then the peer stops sending to the receiver at all, displaying queue full. Something like this happens: peer receives varint(=300) || data(300 bytes) ... peer receives varint(=300) || data(200 bytes only!) sender (go peer) displays queue full

vyzo commented 5 months ago

That sounds like backpressure from the network stack, is the receiver live?

flcl42 commented 5 months ago

That sounds like backpressure from the network stack, is the receiver live?

Yes, peers continue exchange with pings via yamux, for example. I've found no closing signals like FIN/RST also.

outgoing(= mch) channel used to buffer messages, I guess, has len(0) cap(32) and then it suddenly becomes len(32) cap(32). Not a go expert, seems like something closes it and network stream in the process of sending message.

flcl42 commented 5 months ago

Btw what I do: send a message every 300 ms. What has been found right now: the problem happens when total size of the messages is near 256kB! So let me check yamux implementation, because the number reminds me of default yamux window size!

flcl42 commented 5 months ago

@vyzo solved on dotnet side, thanks for the fast response!

vyzo commented 5 months ago

Care to elaborate for future reference?

flcl42 commented 5 months ago

Care to elaborate for future reference?

.NET yamux implementation simply did not grow window

vyzo commented 5 months ago

ok, thanks.