ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.04k stars 3k forks source link

Pubsub subsystem stops working after some time #5835

Open rklaehn opened 5 years ago

rklaehn commented 5 years ago

Version information:

ipfs version --all go-ipfs version: 0.4.18- Repo version: 7 System version: amd64/linux Golang version: go1.11.1

Type:

bug

Description:

We have a number of nodes communicating via pubsub. They are a mixture of 0.4.17 and 0.4.18. Sometimes a node goes into a state where pubsub stops working. On node A there is plenty of pubsub traffic on topic topic. On node B, which is a peer to A, pubsub on the same topic is silent.

ipfs pubsub peers is empty and remains empty even when trying ipfs pubsub sub --discover <topic>. The only thing that gets out of this state is to restart the ipfs daemon.

rklaehn commented 5 years ago

Any recommendations on how to debug this? Add logging? Ask the DHT directly? We can do this the next time we see this....

Stebalien commented 5 years ago

This looks like https://github.com/libp2p/go-libp2p-pubsub/issues/128.

Stebalien commented 5 years ago

Actually, does disconnecting and reconnecting not work?

Regardless, next time you see this, please take a goroutine dump:

curl 'http://127.0.0.1:5001/debug/pprof/goroutine?debug=2'

That way we can check for any obvious deadlocks.

rklaehn commented 5 years ago

We were finally able to get this info. goroutine.txt

IPFS version is 0.4.19, x86 32bit on android

Stebalien commented 5 years ago

I don't see any obvious deadlocks. Can you run ipfs swarm peers --streams (and tell me which peer IDs you expect to participate in pubsub)?

autonome commented 4 years ago

@rklaehn are you still seeing this problem?

vans163 commented 2 years ago

I am seeing this on latest rc of 0.12.

The problem is exactly as described. 1 node is listening on channel, other node does not have it in peer list for a long time like 1-2 minutes, then the node appears in peer list, it gets sent a few messages and it drops from the peer list again.

Its so unreliable that its not really usable.

lidel commented 2 years ago

@vans163

  1. how are you subscribing to the topic? (ipfs in commandline, or via JS-based client in a web browser?)
  2. how many topics to you use?
vans163 commented 2 years ago

@vans163

  1. how are you subscribing to the topic? (ipfs in commandline, or via JS-based client in a web browser?)
  2. how many topics to you use?
  1. tested using the commandline.
  2. 1 topic
//node1
ipfs pubsub sub foo1

//node2
ipfs pubsub peers foo1
ipfs pubsub pub foo1 hi
//worked
//wait few minutes

ipfs pubsub peers foo1
//No peer in list we manually add node
ipfs swarm connect /ipfs/12D3KooWPsiXD2DNPAePYHEDoABuoovDtnRCwW8Jfjk3eab496gi
// works now
ipfs pubsub peers foo1

It would receive for 1-2 minutes then drop the peer from the peerlist. (I am using --profile server), both peers are located on public facing ips (no need for hole punching / no router / no firewall) in datacenters across the globe.

Its just randomly getting dropped.