When publishing a lot of messages I get "dropping message to peer" log info

libp2p / go-libp2p-pubsub

The PubSub implementation for go-libp2p

https://github.com/libp2p/specs/tree/master/pubsub

Other

325 stars 187 forks source link

When publishing a lot of messages I get "dropping message to peer" log info #152

Open iulianpascalau opened 5 years ago

iulianpascalau commented 5 years ago

Hello

I am trying to send around 50000 - 100000 messages from peer 1 to peer 2, each message is around 300 bytes as fast as libp2p's stack can handle (no time.sleep between each send) and sometimes I get

INFO     pubsub: dropping message to peer <peer.ID 16*LBEBuX>: queue full gossipsub.go:348

and sometimes I get:

pubsub: Can't deliver message to subscription for topic tx; subscriber too slow pubsub.go:493

Since there is no feedback when the message has been sent (or not) in my publish loop I do not know how much I have to wait till the next publish. Is there a good practice advice to accomplish this task?

Thanks

vyzo commented 5 years ago

It would be quite unwieldy to provide feedback on this from the library. When is the message considered sent? When it has entered the main loop? When it has been sent to one peer? When it has been sent to all peers? And how do we deal with slow peers?

If you are looking to saturate the network my advice would be to add an operation that yields to the scheduler between messages; a tiny delay (say 1us) could very well work.

iulianpascalau commented 5 years ago

I know it is awkward to have some sort of callback func for publishing. Right now, the message is considered sent when the message object is being fetched from publish chan in the main loop (so another message can be inserted) but this might cause the message to be dropped for each connected peer's output chan. In this case, 1us delay should solve the problem but it doesn't look too good for me. In my high performance LAN might be fine 1us but in a WAN might prove that is insufficient and one message might be lost forever not because of a network broken connection but for the fact that I have accidentally called to often publish func.
Anyway, thanks for the feedback. I will try to figure out a solution on the protocol that uses libp2p stack to resend the messages if the node "thinks" they are lost.

iulianpascalau commented 4 years ago

We found a work-around for this problem: instead of sending a large number of small messages we created a sort of accumulator that gathers around 256k of data and send one such large message. It proved to be a very good solution that works well between multiple nodes across the globe. the time.Sleep value between publishing a new message is, in our case, 10 micro seconds.