ipfs / boxo

A set of reference libraries for building IPFS applications and implementations in Go.
https://github.com/ipfs/boxo#readme
Other
202 stars 87 forks source link

Bitswap peer connection race #432

Open rvagg opened 1 year ago

rvagg commented 1 year ago

Experienced via Lassie's CI suite which has suddenly become flaky with some new workloads. Using the testing/sws-logging branch here (started by @hannahhoward, continued by me) to diagnose with additional debug printing. It evidences in an inability to communicate with a peer that we know has the block, and bitswap has been told has the block.

See logs below, which come from here using 715365a23d36ccd0fe2c9cf95ec16afa0d50f39b on the testing/sws-logging branch. Here's what I believe happens:

I'm a bit fuzzy on what the logs show beyond here and why it doesn't manage to rectify the situation with retries; I guess because it had one chance to take the peer from FindProvidersAsync and ask it for the block, and it failed that one chance and there isn't really another.

The async disconnect in connectEventManager seems to be the main problem here, perhaps a call to Connected() shouldn't return until the state change has been fully propagated.

Relevant logs ``` 2023-08-15T09:52:43.126Z DEBUG bs:sess session/session.go:361 FindMorePeers {"session": 1, "cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "pending": 1} 2023-08-15T09:52:43.126Z DEBUG bitswap providerquerymanager/providerquerymanager.go:332 New Provider Query on cid: bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq 2023-08-15T09:52:43.126Z DEBUG bitswap providerquerymanager/providerquerymanager.go:234 Beginning Find Provider Request for cid: bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq 2023-08-15T09:52:43.126Z DEBUG lassie/bitswap bitswaphelpers/indexerrouting.go:86 provider records requested from bitswap, sending back indexer results {"providerCount": 1} 2023-08-15T09:52:43.126Z DEBUG bitswap providerquerymanager/providerquerymanager.go:244 Connecting to provider 1WRsHVY1dhNKpD... 2023-08-15T09:52:43.126Z DEBUG basichost basic/basic_host.go:737 host QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1 dialing 1WRsHVY1dhNKpD 2023-08-15T09:52:43.126Z DEBUG mocknet mock/mock_peernet.go:119 QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1 (newly) dialing 1WRsHVY1dhNKpD 2023-08-15T09:52:43.126Z DEBUG mocknet mock/mock_peernet.go:132 QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1 dialing 1WRsHVY1dhNKpD openingConn 2023-08-15T09:52:43.126Z DEBUG mocknet mock/mock_peernet.go:140 QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1 opening connection to 1WRsHVY1dhNKpD 2023-08-15T09:52:43.126Z DEBUG mocknet mock/mock_peernet.go:183 1WRsHVY1dhNKpD accepting connection from QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1 2023-08-15T09:52:43.126Z DEBUG bitswap_network network/connecteventmanager.go:159 connectEventManager connected to peer, setting responsive {"peer": "QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1"} 2023-08-15T09:52:43.126Z DEBUG bitswap_network network/connecteventmanager.go:140 connectEventManager detected peer connectivity {"peer": "QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1"} 2023-08-15T09:52:43.126Z DEBUG mocknet mock/mock_conn.go:148 Conn.NewStreamWithProtocol: QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1 --> 1WRsHVY1dhNKpD 2023-08-15T09:52:43.126Z DEBUG bitswap_network network/connecteventmanager.go:159 connectEventManager connected to peer, setting responsive {"peer": "1WRsHVY1dhNKpD"} 2023-08-15T09:52:43.126Z DEBUG mocknet mock/mock_conn.go:148 Conn.NewStreamWithProtocol: 1WRsHVY1dhNKpD --> QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1 2023-08-15T09:52:43.127Z DEBUG basichost basic/basic_host.go:433 negotiated: /ipfs/id/1.0.0 (took 25.701µs) 2023-08-15T09:52:43.127Z DEBUG net/identify identify/id.go:463 sending snapshot {"seq": 2, "protocols": ["/fil/datatransfer/1.2.0","/ipfs/bitswap","/ipfs/bitswap/1.0.0","/ipfs/bitswap/1.1.0","/ipfs/bitswap/1.2.0","/ipfs/graphsync/2.0.0","/ipfs/id/1.0.0","/ipfs/id/push/1.0.0"], "addrs": ["/ip6/100::2438:fce9:9d50:c714/tcp/4242"]} 2023-08-15T09:52:43.127Z DEBUG net/identify identify/id.go:468 /ipfs/id/1.0.0 sending message to 1WRsHVY1dhNKpD /ip4/127.0.0.1/tcp/15381 2023-08-15T09:52:43.127Z DEBUG net/identify identify/id.go:519 /ipfs/id/1.0.0 received message from QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1 /ip6/100::2438:fce9:9d50:c714/tcp/4242 2023-08-15T09:52:43.127Z DEBUG net/identify identify/id.go:787 1WRsHVY1dhNKpD received listen addrs for QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1: [/ip6/100::2438:fce9:9d50:c714/tcp/4242] 2023-08-15T09:52:43.127Z DEBUG basichost basic/basic_host.go:433 negotiated: /ipfs/id/1.0.0 (took 398.503µs) 2023-08-15T09:52:43.127Z DEBUG net/identify identify/id.go:463 sending snapshot {"seq": 2, "protocols": ["/ipfs/bitswap","/ipfs/bitswap/1.0.0","/ipfs/bitswap/1.1.0","/ipfs/bitswap/1.2.0","/ipfs/id/1.0.0","/ipfs/id/push/1.0.0"], "addrs": ["/ip4/127.0.0.1/tcp/15381"]} 2023-08-15T09:52:43.127Z DEBUG net/identify identify/id.go:468 /ipfs/id/1.0.0 sending message to QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1 /ip6/100::2438:fce9:9d50:c714/tcp/4242 2023-08-15T09:52:43.127Z DEBUG net/identify identify/id.go:519 /ipfs/id/1.0.0 received message from 1WRsHVY1dhNKpD /ip4/127.0.0.1/tcp/15381 2023-08-15T09:52:43.127Z DEBUG net/identify identify/id.go:787 QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1 received listen addrs for 1WRsHVY1dhNKpD: [/ip4/127.0.0.1/tcp/15381] 2023-08-15T09:52:43.127Z WARN net/identify identify/id.go:811 QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1 cannot unmarshal key from remote peer: 1WRsHVY1dhNKpD, asn1: structure error: tags don't match (16 vs {class:3 tag:27 length:52 isCompound:false}) {optional:false explicit:false application:false private:false defaultValue: tag: stringType:0 timeType:0 set:false omitEmpty:false} publicKeyInfo @2 2023-08-15T09:52:43.127Z DEBUG basichost basic/basic_host.go:754 host QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1 finished dialing 1WRsHVY1dhNKpD 2023-08-15T09:52:43.127Z DEBUG bitswap providerquerymanager/providerquerymanager.go:250 Connected to provider 1WRsHVY1dhNKpD 2023-08-15T09:52:43.127Z DEBUG bitswap providerquerymanager/providerquerymanager.go:264 Found 0 providers for cid: bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq 2023-08-15T09:52:43.127Z DEBUG net/identify identify/obsaddr.go:442 added own observed listen addr {"observed": "/ip6/100::2438:fce9:9d50:c714/tcp/4242"} 2023-08-15T09:52:43.127Z DEBUG bitswap providerquerymanager/providerquerymanager.go:332 Received provider (1WRsHVY1dhNKpD) for cid (bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq) 2023-08-15T09:52:43.127Z DEBUG bitswap providerquerymanager/providerquerymanager.go:332 Finished Provider Query on cid: bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq 2023-08-15T09:52:43.127Z DEBUG bs:peermgr peermanager/peermanager.go:214 pm RegisterSession {"peer": "1WRsHVY1dhNKpD", "session": 1} 2023-08-15T09:52:43.127Z DEBUG bs:sprmgr sessionpeermanager/sessionpeermanager.go:68 Bitswap: Added peer to session {"session": 1, "peer": "1WRsHVY1dhNKpD", "peerCount": 1} 2023-08-15T09:52:43.127Z DEBUG bs:sess session/sessionwantsender.go:[65](https://github.com/filecoin-project/lassie/actions/runs/5865827197/job/15903479085?pr=371#step:63:67)7 sws update want block presence {"cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "peerID": "1WRsHVY1dhNKpD"} 2023-08-15T09:52:43.127Z DEBUG bs:sess session/sessionwantsender.go:764 sws new best peer {"peerID": "1WRsHVY1dhNKpD"} 2023-08-15T09:52:43.127Z DEBUG bs:sess session/sessionwantsender.go:657 sws update want block presence {"cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "peerID": "1WRsHVY1dhNKpD"} 2023-08-15T09:52:43.127Z DEBUG bs:sess session/sessionwantsender.go:764 sws new best peer {"peerID": "1WRsHVY1dhNKpD"} 2023-08-15T09:52:43.127Z DEBUG bs:sess session/sessionwantsender.go:535 sws send next wants {"newly available": ["1WRsHVY1dhNKpD"]} 2023-08-15T09:52:43.127Z DEBUG bs:sess session/sessionwantsender.go:558 sws send new want block {"cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "peerID": "1WRsHVY1dhNKpD"} 2023-08-15T09:52:43.127Z DEBUG bs:sess session/sessionwantsender.go:580 sws sending wants {"sends": 1} 2023-08-15T09:52:43.127Z DEBUG bs:peermgr peermanager/peermanager.go:155 pm SendWants {"peer": "1WRsHVY1dhNKpD", "wantBlocks": 1, "wantHaves": 1} 2023-08-15T09:52:43.127Z DEBUG bs:peermgr peermanager/peermanager.go:159 pm SendWants no peerQueue for {"peer": "1WRsHVY1dhNKpD"} 2023-08-15T09:52:43.127Z DEBUG bitswap_network network/connecteventmanager.go:140 connectEventManager detected peer connectivity {"peer": "1WRsHVY1dhNKpD"} 2023-08-15T09:52:43.127Z DEBUG bs:peermgr peermanager/peermanager.go:91 pm Connected {"peer": "1WRsHVY1dhNKpD"} 2023-08-15T09:52:43.127Z DEBUG bs:peermgr peermanager/peermanager.go:203 pm getOrCreate new peer {"peer": "1WRsHVY1dhNKpD"} 2023-08-15T09:52:43.127Z DEBUG bitswap messagequeue/messagequeue.go:280 mq: add want have from broadcast {"cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "peerID": "1WRsHVY1dhNKpD"} 2023-08-15T09:52:43.127Z DEBUG bs:sess session/sessionwantsender.go:535 sws send next wants {"newly available": null} 2023-08-15T09:52:43.127Z DEBUG bs:sess session/sessionwantsender.go:546 sws want block in process, no new want block sent {"cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"} 2023-08-15T09:52:43.127Z DEBUG bs:sess session/sessionwantsender.go:580 sws sending wants {"sends": 0} 2023-08-15T09:52:43.129Z DEBUG mocknet mock/mock_conn.go:148 Conn.NewStreamWithProtocol: QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1 --> 1WRsHVY1dhNKpD 2023-08-15T09:52:43.129Z DEBUG bitswap messagequeue/messagequeue.go:[66](https://github.com/filecoin-project/lassie/actions/runs/5865827197/job/15903479085?pr=371#step:63:68)6 sent message {"type": "WANT_HAVE", "cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "local": "QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1", "to": "1WRsHVY1dhNKpD"} 2023-08-15T09:52:43.129Z WARN bitswap_network network/ipfs_impl.go:245 error setting deadline: set pipe: deadline not supported 2023-08-15T09:52:43.130Z DEBUG mocknet mock/mock_conn.go:148 Conn.NewStreamWithProtocol: QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1 --> 1WRsHVY1dhNKpD 2023-08-15T09:52:43.130Z WARN bitswap_network network/ipfs_impl.go:269 error resetting deadline: set pipe: deadline not supported 2023-08-15T09:52:43.130Z DEBUG bs:sess session/session.go:4[67](https://github.com/filecoin-project/lassie/actions/runs/5865827197/job/15903479085?pr=371#step:63:69) broadcastWantHaves {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]} 2023-08-15T09:52:43.130Z DEBUG basichost basic/basic_host.go:433 negotiated: /ipfs/bitswap/1.2.0 (took 800.106µs) 2023-08-15T09:52:43.131Z DEBUG bitswap_network network/ipfs_impl.go:427 bitswap net handleNewStream from QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1 2023-08-15T09:52:43.131Z DEBUG engine decision/engine.go:634 Bitswap engine <- msg {"local": "1WRsHVY1dhNKpD", "from": "QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1", "entryCount": 1} 2023-08-15T09:52:43.131Z DEBUG engine decision/engine.go:638 Bitswap engine <- want-have {"local": "1WRsHVY1dhNKpD", "from": "QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1", "cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"} LinkSystemBlockstore.GetSize bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq LinkSystemBlockstore.GetSize= bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq 1332 2023-08-15T09:52:43.132Z DEBUG engine decision/engine.go:779 Bitswap engine: block found {"local": "1WRsHVY1dhNKpD", "from": "QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1", "cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "isWantBlock": false} 2023-08-15T09:52:43.132Z DEBUG engine decision/engine.go:545 Bitswap process tasks {"local": "1WRsHVY1dhNKpD", "taskCount": 1} 2023-08-15T09:52:43.132Z DEBUG engine decision/engine.go:598 Bitswap engine -> msg {"local": "1WRsHVY1dhNKpD", "to": "QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1", "blockCount": 0, "presenceCount": 1, "size": 38} 2023-08-15T09:52:43.133Z DEBUG mocknet mock/mock_conn.go:148 Conn.NewStreamWithProtocol: 1WRsHVY1dhNKpD --> QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1 2023-08-15T09:52:43.133Z WARN bitswap_network network/ipfs_impl.go:245 error setting deadline: set pipe: deadline not supported 2023-08-15T09:52:43.133Z DEBUG basichost basic/basic_host.go:413 protocol mux failed: stream reset (took 2.485821ms) 2023-08-15T09:52:43.133Z WARN bitswap_network network/ipfs_impl.go:2[69](https://github.com/filecoin-project/lassie/actions/runs/5865827197/job/15903479085?pr=371#step:63:71) error resetting deadline: set pipe: deadline not supported 2023-08-15T09:52:43.133Z DEBUG basichost basic/basic_host.go:433 negotiated: /ipfs/bitswap/1.2.0 (took 321.[70](https://github.com/filecoin-project/lassie/actions/runs/5865827197/job/15903479085?pr=371#step:63:72)3µs) 2023-08-15T09:52:43.134Z DEBUG bitswap_network network/ipfs_impl.go:427 bitswap net handleNewStream from 1WRsHVY1dhNKpD 2023-08-15T09:52:43.134Z DEBUG bs:sess session/session.go:221 Bitswap <- HAVE {"local": "QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1", "from": "1WRsHVY1dhNKpD", "cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "session": 1} 2023-08-15T09:52:43.134Z DEBUG bs:peermgr peermanager/peermanager.go:214 pm RegisterSession {"peer": "1WRsHVY1dhNKpD", "session": 1} 2023-08-15T09:52:43.134Z DEBUG bs:sess session/sessionwantsender.go:657 sws update want block presence {"cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "peerID": "1WRsHVY1dhNKpD"} 2023-08-15T09:52:43.134Z DEBUG bs:sess session/sessionwantsender.go:[76](https://github.com/filecoin-project/lassie/actions/runs/5865827197/job/15903479085?pr=371#step:63:78)4 sws new best peer {"peerID": "1WRsHVY1dhNKpD"} 2023-08-15T09:52:43.134Z DEBUG bs:sess session/sessionwantsender.go:535 sws send next wants {"newly available": null} 2023-08-15T09:52:43.134Z DEBUG bs:sess session/sessionwantsender.go:546 sws want block in process, no new want block sent {"cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"} 2023-08-15T09:52:43.134Z DEBUG bs:sess session/sessionwantsender.go:580 sws sending wants {"sends": 0} 2023-08-15T09:52:43.134Z DEBUG bitswap-server server/server.go:325 sent message {"type": "HAVE", "cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "local": "1WRsHVY1dhNKpD", "to": "QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1"} 2023-08-15T09:52:43.135Z DEBUG bitswap-server server/server.go:3[79](https://github.com/filecoin-project/lassie/actions/runs/5865827197/job/15903479085?pr=371#step:63:81) sent message {"peer": "QmRR8VvuQf4Fvr2ptGJj2qjJkmqREJXbF4jv8KkSRLsMj1"} 2023-08-15T09:52:43.135Z DEBUG bitswap-server server/server.go:2[82](https://github.com/filecoin-project/lassie/actions/runs/5865827197/job/15903479085?pr=371#step:63:84) Bitswap.TaskWorker.Loop {"ID": 0} 2023-08-15T09:52:43.[139](https://github.com/filecoin-project/lassie/actions/runs/5865827197/job/15903479085?pr=371#step:63:141)Z DEBUG bs:sess session/session.go:467 broadcastWantHaves {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]} 2023-08-15T09:52:43.151Z DEBUG bs:sess session/session.go:467 broadcastWantHaves {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]} ```
Jorropo commented 1 year ago

Thx a lot this is pretty clear, while working on #423 I was looking at changing this part of the code (backpressure free prefix-sum-like state versioning that would more aggressively use the Full flag on messages to resynchronize the server if the client's stream sender got behind). It sounds like the fix you propose would be pretty light, right ? If it is I think we should implement that so it could be landed faster.

hannahhoward commented 1 year ago

@Jorropo I would also suggest: PeerManager.SendWants silently fails if you don't have a peer connection - https://github.com/ipfs/boxo/blob/main/bitswap/client/internal/peermanager/peermanager.go#L150 -- as in it does nothing and lets those wants/blocks get lost, but doesn't tell the caller that happened.

This seems like not the best behavior -- it should probably return an error to the caller to give the caller the opportunity to retry.

Jorropo commented 1 year ago

This was reverted https://github.com/ipfs/boxo/pull/444

BigLep commented 1 year ago

Per FIL Slack (https://filecoinproject.slack.com/archives/C03FFEVK30F/p1692628181061129?thread_ts=1692232716.286059&cid=C03FFEVK30F ), Kubo/Boxo maintainers will discuss/estimate next steps during 2023-08-22 standup and respond here by EOD 2023-08-22 as we understand this is important to fix for getting range requests landed for Rhea in Lassie.

BigLep commented 1 year ago

2023-08-22 maintainer conversation:

General notes:

  1. We need to rethink some of the Bitswap session code (which sessions assumes findProvs is useful)
  2. Lassie is running into this bug because have disabled various aspects of the bitswap codebase (e.g., broadcasts).
  3. It's a performance bug for Kubo but a showstopper for Lassie.
  4. This is something we want to do, the only concern is how long it would take.
  5. If we do some of this rewrite now, will it be useful for further rewrites? (Is this throwaway work that only helps one team?)

Next steps for @Jorropo for moving this forward:

  1. Identify how should solve this
  2. Estimate how long this fix would take
  3. Identify how "throwaway" this work is (i.e., is it on the path of improvements to this code area we believe we should do anyways)
  4. Make a decision on if/when will complete this

Before getting to this though @Jorropo is tied up with:

  1. Go 1.20 update with @hacdias
  2. Getting Bitswap tests to reenabled and reliably passing: https://github.com/ipfs/boxo/issues/327 . We can't be doing refactor above with flakiness persisting. For example, it may have helped catch problems with https://github.com/ipfs/boxo/pull/435 before we merged it.

@Jorropo doesn't expect he will have https://github.com/ipfs/boxo/issues/327 completed until the end of the week, which means the estimate/decision/date won't come until week of 2023-08-28. Per slack message, Bedrock is unblocked for the short term.

BigLep commented 1 year ago

The previous maintainer conversation comment was updated to fill in more details and be clear that further work here won't get picked up until week of 2023-08-28.

hannahhoward commented 1 year ago

Lassie is running into this bug because have disabled various aspects of the bitswap codebase (e.g., broadcasts).

Want to note this is NOT a true statement. We use stock bitswap. We do not disable broadcasts. This has never been possible. The only replacement code we offer is to the FindProvidersAsync call, substituting IPNI results.

hannahhoward commented 1 year ago

In fact, to clarify our own logs show Bitswap continues to broadcast, meaning this error should effect ANY client. The problem is while it broadcasts want haves, it never switches to requesting a WANT-BLOCK.

Jorropo commented 1 year ago

@hannahhoward wouldn't at-worst after 30s the wantlist rebroadcast happens and unblock the session, or after 2 RTTs ? Because if the session never ever sends a WANT-BLOCK we could get a quick win where sessions would stop using FindProvidersAsync results at all and 100% rely ony rebroadcasting that means we could do:

This would be wasting a roundtrip, but AFAIT is a quick win.

hannahhoward commented 1 year ago

Listen, I was surprised as heck that rebroadcast doesn't do anything. But that's what the logs show on our side.

I believe the fundamental problem is that the PeerManager.SendWants fails silently when there is no peer queue. It should error and the calling code (SessionWantSender in this case) should figure out what to do with that.

You can see the chain of rebroadcasts at the end of the log for this test run.

2023-08-15T03:50:05.3658816Z 2023-08-15T03:49:24.539Z   DEBUG   blockservice    blockservice/blockservice.go:247    BlockService: Searching
2023-08-15T03:50:05.3659584Z 2023-08-15T03:49:24.539Z   DEBUG   bitswap getter/getter.go:86 Bitswap.GetBlockRequest.Start   {"cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}
2023-08-15T03:50:05.3660049Z 2023-08-15T03:49:24.540Z   INFO    bs:sess session/session.go:460  No peers - broadcasting {"session": 1, "want-count": 1}
2023-08-15T03:50:05.3660814Z 2023-08-15T03:49:24.540Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3661634Z 2023-08-15T03:49:24.544Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3662425Z 2023-08-15T03:49:24.544Z   DEBUG   bs:sess session/session.go:361  FindMorePeers   {"session": 1, "cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "pending": 1}
2023-08-15T03:50:05.3663230Z 2023-08-15T03:49:24.544Z   DEBUG   bitswap providerquerymanager/providerquerymanager.go:329    New Provider Query on cid: bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq
2023-08-15T03:50:05.3664151Z 2023-08-15T03:49:24.544Z   DEBUG   bitswap providerquerymanager/providerquerymanager.go:234    Beginning Find Provider Request for cid: bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq
2023-08-15T03:50:05.3664931Z 2023-08-15T03:49:24.544Z   DEBUG   lassie/bitswap  bitswaphelpers/indexerrouting.go:86 provider records requested from bitswap, sending back indexer results   {"providerCount": 1}
2023-08-15T03:50:05.3665603Z 2023-08-15T03:49:24.544Z   DEBUG   basichost   basic/basic_host.go:737 host QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV dialing 1WRsHVXm2wNTMm
2023-08-15T03:50:05.3666112Z 2023-08-15T03:49:24.544Z   DEBUG   mocknet mock/mock_peernet.go:119    QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV (newly) dialing 1WRsHVXm2wNTMm
2023-08-15T03:50:05.3666798Z 2023-08-15T03:49:24.545Z   DEBUG   mocknet mock/mock_peernet.go:132    QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV dialing 1WRsHVXm2wNTMm openingConn
2023-08-15T03:50:05.3667474Z 2023-08-15T03:49:24.545Z   DEBUG   mocknet mock/mock_peernet.go:140    QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV opening connection to 1WRsHVXm2wNTMm
2023-08-15T03:50:05.3668222Z 2023-08-15T03:49:24.545Z   DEBUG   mocknet mock/mock_conn.go:148   Conn.NewStreamWithProtocol: QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV --> 1WRsHVXm2wNTMm
2023-08-15T03:50:05.3668786Z 2023-08-15T03:49:24.545Z   DEBUG   basichost   basic/basic_host.go:433 negotiated: /ipfs/id/1.0.0 (took 72.901µs)
2023-08-15T03:50:05.3669625Z 2023-08-15T03:49:24.545Z   DEBUG   net/identify    identify/id.go:463  sending snapshot    {"seq": 2, "protocols": ["/ipfs/bitswap","/ipfs/bitswap/1.0.0","/ipfs/bitswap/1.1.0","/ipfs/bitswap/1.2.0","/ipfs/id/1.0.0","/ipfs/id/push/1.0.0"], "addrs": ["/ip4/127.0.0.1/tcp/24734"]}
2023-08-15T03:50:05.3670491Z 2023-08-15T03:49:24.545Z   DEBUG   net/identify    identify/id.go:468  /ipfs/id/1.0.0 sending message to QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV /ip6/100::ee68:9659:e2d0:d7c6/tcp/4242
2023-08-15T03:50:05.3671228Z 2023-08-15T03:49:24.545Z   DEBUG   net/identify    identify/id.go:519  /ipfs/id/1.0.0 received message from 1WRsHVXm2wNTMm /ip4/127.0.0.1/tcp/24734
2023-08-15T03:50:05.3671982Z 2023-08-15T03:49:24.546Z   DEBUG   net/identify    identify/id.go:787  QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV received listen addrs for 1WRsHVXm2wNTMm: [/ip4/127.0.0.1/tcp/24734]
2023-08-15T03:50:05.3672998Z 2023-08-15T03:49:24.546Z   WARN    net/identify    identify/id.go:811  QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV cannot unmarshal key from remote peer: 1WRsHVXm2wNTMm, asn1: syntax error: truncated tag or length
2023-08-15T03:50:05.3673720Z 2023-08-15T03:49:24.546Z   DEBUG   net/identify    identify/obsaddr.go:442 added own observed listen addr  {"observed": "/ip6/100::ee68:9659:e2d0:d7c6/tcp/4242"}
2023-08-15T03:50:05.3674449Z 2023-08-15T03:49:24.546Z   DEBUG   basichost   basic/basic_host.go:754 host QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV finished dialing 1WRsHVXm2wNTMm
2023-08-15T03:50:05.3675311Z 2023-08-15T03:49:24.546Z   DEBUG   bitswap providerquerymanager/providerquerymanager.go:329    Received provider (1WRsHVXm2wNTMm) for cid (bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq)
2023-08-15T03:50:05.3676071Z 2023-08-15T03:49:24.546Z   DEBUG   bitswap providerquerymanager/providerquerymanager.go:329    Finished Provider Query on cid: bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq
2023-08-15T03:50:05.3676783Z 2023-08-15T03:49:24.546Z   DEBUG   bs:sprmgr   sessionpeermanager/sessionpeermanager.go:68 Bitswap: Added peer to session  {"session": 1, "peer": "1WRsHVXm2wNTMm", "peerCount": 1}
2023-08-15T03:50:05.3677631Z 2023-08-15T03:49:24.546Z   DEBUG   bs:sess session/sessionwantsender.go:655    sws update want block presence  {"cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "peerID": "1WRsHVXm2wNTMm"}
2023-08-15T03:50:05.3678193Z 2023-08-15T03:49:24.546Z   DEBUG   bs:sess session/sessionwantsender.go:762    sws new best peer   {"peerID": "1WRsHVXm2wNTMm"}
2023-08-15T03:50:05.3679030Z 2023-08-15T03:49:24.546Z   DEBUG   bs:sess session/sessionwantsender.go:655    sws update want block presence  {"cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "peerID": "1WRsHVXm2wNTMm"}
2023-08-15T03:50:05.3679595Z 2023-08-15T03:49:24.546Z   DEBUG   bs:sess session/sessionwantsender.go:762    sws new best peer   {"peerID": "1WRsHVXm2wNTMm"}
2023-08-15T03:50:05.3680183Z 2023-08-15T03:49:24.546Z   DEBUG   bs:sess session/sessionwantsender.go:534    sws send next wants {"newly available": ["1WRsHVXm2wNTMm"]}
2023-08-15T03:50:05.3681054Z 2023-08-15T03:49:24.546Z   DEBUG   bs:sess session/sessionwantsender.go:557    sws send new want block {"cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "peerID": "1WRsHVXm2wNTMm"}
2023-08-15T03:50:05.3681500Z 2023-08-15T03:49:24.547Z   DEBUG   bs:sess session/sessionwantsender.go:534    sws send next wants {"newly available": null}
2023-08-15T03:50:05.3682429Z 2023-08-15T03:49:24.547Z   DEBUG   bs:sess session/sessionwantsender.go:545    sws want block in process, no new want block sent   {"cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}
2023-08-15T03:50:05.3683016Z 2023-08-15T03:49:24.547Z   DEBUG   mocknet mock/mock_peernet.go:183    1WRsHVXm2wNTMm accepting connection from QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV
2023-08-15T03:50:05.3683738Z 2023-08-15T03:49:24.547Z   DEBUG   mocknet mock/mock_conn.go:148   Conn.NewStreamWithProtocol: 1WRsHVXm2wNTMm --> QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV
2023-08-15T03:50:05.3684223Z 2023-08-15T03:49:24.547Z   DEBUG   basichost   basic/basic_host.go:433 negotiated: /ipfs/id/1.0.0 (took 96.201µs)
2023-08-15T03:50:05.3685455Z 2023-08-15T03:49:24.547Z   DEBUG   net/identify    identify/id.go:463  sending snapshot    {"seq": 2, "protocols": ["/fil/datatransfer/1.2.0","/ipfs/bitswap","/ipfs/bitswap/1.0.0","/ipfs/bitswap/1.1.0","/ipfs/bitswap/1.2.0","/ipfs/graphsync/2.0.0","/ipfs/id/1.0.0","/ipfs/id/push/1.0.0"], "addrs": ["/ip6/100::ee68:9659:e2d0:d7c6/tcp/4242"]}
2023-08-15T03:50:05.3685945Z 2023-08-15T03:49:24.547Z   DEBUG   net/identify    identify/id.go:468  /ipfs/id/1.0.0 sending message to 1WRsHVXm2wNTMm /ip4/127.0.0.1/tcp/24734
2023-08-15T03:50:05.3686987Z 2023-08-15T03:49:24.548Z   DEBUG   net/identify    identify/id.go:519  /ipfs/id/1.0.0 received message from QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV /ip6/100::ee68:9659:e2d0:d7c6/tcp/4242
2023-08-15T03:50:05.3687828Z 2023-08-15T03:49:24.548Z   DEBUG   mocknet mock/mock_conn.go:148   Conn.NewStreamWithProtocol: QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV --> 1WRsHVXm2wNTMm
2023-08-15T03:50:05.3688986Z 2023-08-15T03:49:24.548Z   DEBUG   bitswap messagequeue/messagequeue.go:663    sent message    {"type": "WANT_HAVE", "cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "local": "QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV", "to": "1WRsHVXm2wNTMm"}
2023-08-15T03:50:05.3689564Z 2023-08-15T03:49:24.548Z   WARN    bitswap_network network/ipfs_impl.go:245    error setting deadline: set pipe: deadline not supported
2023-08-15T03:50:05.3690278Z 2023-08-15T03:49:24.548Z   DEBUG   mocknet mock/mock_conn.go:148   Conn.NewStreamWithProtocol: QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV --> 1WRsHVXm2wNTMm
2023-08-15T03:50:05.3690853Z 2023-08-15T03:49:24.548Z   WARN    bitswap_network network/ipfs_impl.go:269    error resetting deadline: set pipe: deadline not supported
2023-08-15T03:50:05.3691308Z 2023-08-15T03:49:24.548Z   DEBUG   basichost   basic/basic_host.go:433 negotiated: /ipfs/bitswap/1.2.0 (took 91.601µs)
2023-08-15T03:50:05.3691981Z 2023-08-15T03:49:24.549Z   DEBUG   bitswap_network network/ipfs_impl.go:427    bitswap net handleNewStream from QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV
2023-08-15T03:50:05.3692765Z 2023-08-15T03:49:24.549Z   DEBUG   engine  decision/engine.go:634  Bitswap engine <- msg   {"local": "1WRsHVXm2wNTMm", "from": "QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV", "entryCount": 1}
2023-08-15T03:50:05.3693683Z 2023-08-15T03:49:24.549Z   DEBUG   engine  decision/engine.go:638  Bitswap engine <- want-have {"local": "1WRsHVXm2wNTMm", "from": "QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV", "cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}
2023-08-15T03:50:05.3694182Z LinkSystemBlockstore.GetSize bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq
2023-08-15T03:50:05.3694525Z LinkSystemBlockstore.GetSize= bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq 1332
2023-08-15T03:50:05.3695427Z 2023-08-15T03:49:24.549Z   DEBUG   engine  decision/engine.go:779  Bitswap engine: block found {"local": "1WRsHVXm2wNTMm", "from": "QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV", "cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "isWantBlock": false}
2023-08-15T03:50:05.3695865Z 2023-08-15T03:49:24.549Z   DEBUG   engine  decision/engine.go:545  Bitswap process tasks   {"local": "1WRsHVXm2wNTMm", "taskCount": 1}
2023-08-15T03:50:05.3696688Z 2023-08-15T03:49:24.549Z   DEBUG   engine  decision/engine.go:598  Bitswap engine -> msg   {"local": "1WRsHVXm2wNTMm", "to": "QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV", "blockCount": 0, "presenceCount": 1, "size": 38}
2023-08-15T03:50:05.3697263Z 2023-08-15T03:49:24.549Z   DEBUG   mocknet mock/mock_conn.go:148   Conn.NewStreamWithProtocol: 1WRsHVXm2wNTMm --> QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV
2023-08-15T03:50:05.3697668Z 2023-08-15T03:49:24.549Z   DEBUG   basichost   basic/basic_host.go:413 protocol mux failed: stream reset (took 1.057616ms)
2023-08-15T03:50:05.3698303Z 2023-08-15T03:49:24.550Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3699084Z 2023-08-15T03:49:24.550Z   DEBUG   net/identify    identify/id.go:787  1WRsHVXm2wNTMm received listen addrs for QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV: [/ip6/100::ee68:9659:e2d0:d7c6/tcp/4242]
2023-08-15T03:50:05.3699531Z 2023-08-15T03:49:24.550Z   WARN    bitswap_network network/ipfs_impl.go:245    error setting deadline: set pipe: deadline not supported
2023-08-15T03:50:05.3700098Z 2023-08-15T03:49:24.550Z   WARN    bitswap_network network/ipfs_impl.go:269    error resetting deadline: set pipe: deadline not supported
2023-08-15T03:50:05.3700623Z 2023-08-15T03:49:24.550Z   DEBUG   basichost   basic/basic_host.go:433 negotiated: /ipfs/bitswap/1.2.0 (took 816.112µs)
2023-08-15T03:50:05.3701048Z 2023-08-15T03:49:24.550Z   DEBUG   bitswap_network network/ipfs_impl.go:427    bitswap net handleNewStream from 1WRsHVXm2wNTMm
2023-08-15T03:50:05.3701893Z 2023-08-15T03:49:24.550Z   DEBUG   bs:sess session/session.go:221  Bitswap <- HAVE {"local": "QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV", "from": "1WRsHVXm2wNTMm", "cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "session": 1}
2023-08-15T03:50:05.3702712Z 2023-08-15T03:49:24.551Z   DEBUG   bs:sess session/sessionwantsender.go:655    sws update want block presence  {"cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "peerID": "1WRsHVXm2wNTMm"}
2023-08-15T03:50:05.3703132Z 2023-08-15T03:49:24.551Z   DEBUG   bs:sess session/sessionwantsender.go:762    sws new best peer   {"peerID": "1WRsHVXm2wNTMm"}
2023-08-15T03:50:05.3703543Z 2023-08-15T03:49:24.551Z   DEBUG   bs:sess session/sessionwantsender.go:534    sws send next wants {"newly available": null}
2023-08-15T03:50:05.3704256Z 2023-08-15T03:49:24.551Z   DEBUG   bs:sess session/sessionwantsender.go:545    sws want block in process, no new want block sent   {"cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}
2023-08-15T03:50:05.3705186Z 2023-08-15T03:49:24.550Z   DEBUG   bitswap-server  server/server.go:325    sent message    {"type": "HAVE", "cid": "bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq", "local": "1WRsHVXm2wNTMm", "to": "QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV"}
2023-08-15T03:50:05.3705717Z 2023-08-15T03:49:24.551Z   DEBUG   bitswap-server  server/server.go:379    sent message    {"peer": "QmdjqZU7uhXBdAdas2bAq9NUcfopNtj9noWvkhzDoBnCQV"}
2023-08-15T03:50:05.3706110Z 2023-08-15T03:49:24.551Z   DEBUG   bitswap-server  server/server.go:282    Bitswap.TaskWorker.Loop {"ID": 6}
2023-08-15T03:50:05.3706871Z 2023-08-15T03:49:24.558Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3707527Z 2023-08-15T03:49:24.571Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3708152Z 2023-08-15T03:49:24.587Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3708787Z 2023-08-15T03:49:24.608Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3709446Z 2023-08-15T03:49:24.633Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3710015Z 2023-08-15T03:49:24.661Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3710641Z 2023-08-15T03:49:24.694Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3711261Z 2023-08-15T03:49:24.731Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3711893Z 2023-08-15T03:49:24.772Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3712501Z 2023-08-15T03:49:24.817Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3713113Z 2023-08-15T03:49:24.866Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3713818Z 2023-08-15T03:49:24.918Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3714499Z 2023-08-15T03:49:24.975Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3715257Z 2023-08-15T03:49:25.035Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3715893Z 2023-08-15T03:49:25.101Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3716502Z 2023-08-15T03:49:25.169Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3717115Z 2023-08-15T03:49:25.242Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3717725Z 2023-08-15T03:49:25.319Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3718356Z 2023-08-15T03:49:25.400Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3718916Z 2023-08-15T03:49:25.484Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3719534Z 2023-08-15T03:49:25.573Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3720284Z 2023-08-15T03:49:25.667Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3720991Z 2023-08-15T03:49:25.763Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3721613Z 2023-08-15T03:49:25.864Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3722224Z 2023-08-15T03:49:25.968Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3722830Z 2023-08-15T03:49:26.077Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3723440Z 2023-08-15T03:49:26.190Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3724049Z 2023-08-15T03:49:26.308Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3724668Z 2023-08-15T03:49:26.428Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3725276Z 2023-08-15T03:49:26.553Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3725931Z 2023-08-15T03:49:26.682Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3726621Z 2023-08-15T03:49:26.815Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3727230Z 2023-08-15T03:49:26.952Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3727945Z 2023-08-15T03:49:27.092Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3728563Z 2023-08-15T03:49:27.237Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3729176Z 2023-08-15T03:49:27.385Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3729804Z 2023-08-15T03:49:27.538Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3730411Z 2023-08-15T03:49:27.695Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3731027Z 2023-08-15T03:49:27.855Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3731667Z 2023-08-15T03:49:28.020Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3732279Z 2023-08-15T03:49:28.189Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3732889Z 2023-08-15T03:49:28.361Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3733497Z 2023-08-15T03:49:28.538Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3734112Z 2023-08-15T03:49:28.719Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3734733Z 2023-08-15T03:49:28.903Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3735347Z 2023-08-15T03:49:29.092Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3735997Z 2023-08-15T03:49:29.285Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
2023-08-15T03:50:05.3736548Z 2023-08-15T03:49:29.482Z   DEBUG   bs:sess session/session.go:467  broadcastWantHaves  {"session": 1, "cids": [{"/":"bafybeifrrglx2issn2had5rtstn3xltla6vxmpjfwfz7o3hapvkynh4zoq"}]}
Jorropo commented 1 year ago

Ok intresting, what I was trying to get at is that something is left in a bad invalid state.

hannahhoward commented 1 year ago

alternate fix proposed, basically trying to keep up from getting ot an incorrect state as way to fix this issue: https://github.com/ipfs/boxo/pull/452 -- @Jorropo I believe with this either the subsequent connect event getting processed or a rebroadcast will put us back into the correct state.

rvagg commented 11 months ago

What are the chances of this being looked at in the near future? We've had to pin Lassie to the commit I got into main but was subsequently reverted (https://github.com/filecoin-project/lassie/blob/9a94e077568971c9b941f00ddbff4a32cd82117b/go.mod#L13) and at some point that's going to become a compatibility or even security problem. So I need to figure out whether I need to be planning on maintaining a fork or a branch here to deal with it on my end.

BigLep commented 10 months ago

Hi @rvagg - we missed on answering your question sooner.

If I understand correctly, the initial proposed fix was merged (https://github.com/ipfs/boxo/pull/435 ) but then reverted (https://github.com/ipfs/boxo/pull/444 ).

This hasn't been a priority for Boxo maintainers to directly address, but we understand it puts Lassie in a tough spot. If there's a different approach that accounts for the feedback from the reverted attempt, I'll make sure maintainers give it review.

Is https://github.com/ipfs/boxo/pull/452 such an attempt? If so, please mark it out of draft when it's ready for review.

Thanks.

rvagg commented 10 months ago

I believe #452 is just a proposal for how this issue might be addressed, not a proper fix, it needs to be taken further.

Here's my understanding of the state of this issue:

  1. Lassie is hit acutely by this because we have a specific list of providers that we know very likely have the content we are after so we target those—not getting a response to the wants is a surprising outcome that should not occur since we have told bitswap who to send them to.
  2. This is very likely to have broad impact on bitswap users but is unlikely to be as visible as it is to Lassie due to the more detached nature of the providers (being at arms length via the discovery mechanisms instead of the indexer lookup Lassie does). It'll mean that a peer seeking blocks will figure out who likely has them, then attempt to ask for them, but not get any kind of response because of the race condition. It'll be more of a suboptimal state than a failure state since you can often go fishing for more peers. Perhaps though it accounts for a lot of the "I know it's on IPFS but Kubo occasionally fails to find it" problems.
  3. The objection to the original fix were due to the use of this code at scale. So the tension exists between the (more common) small-client cases and the (more critical) at-scale cases. The current priority is for the latter.
  4. Lassie is stuck on a commit of Boxo and at some point the monolith-dependency trap is going to force our hand in some way. My guess is that it'll probably be a fork of boxo/bitswap so we can update Boxo dependency while having an alternative version of Bitswap.

Not sure what else is to be done about this with the limited people-bandwidth we have and the current state of prioritisation across our orgs. 🤷

gammazero commented 9 months ago

Bitswap relies on "being connected" to mean that state has been fully propagated. In other words, this means that a call to Connected() shouldn't return until the state change has been fully propagated (as was mentioned earlier). Specifically, a connection is not fully established until PeerManager has a peerQueues entry, since bitswap relies on that as part of being connected. Seems that there needs to be a way to force/wait for a peerQueues entry to either complete or fail before bitswap considers the connection fully established. PR #452 is also good to not silently fail, but it seems like there is still more to do to complete connection establishment.

Dreamacro commented 9 months ago

I can also confirm that this bug exists and that it has some far-reaching implications. I'm currently on a private libp2p network with 600 peers, and after actual file synchronization speed tests, I've seen significant rate increases with #452.

In the course of testing, I discovered an interesting phenomenon:

  1. without #452 When I use merkledag.NewSession, the rate is very slow and likely to hang; When using only dag.Get, the rate is not fast enough but acceptable
  2. with #452 When I use merkledag.NewSession, the rate is not fast enough but does not hang; When using only dag.Get, the rate is fast enough, but some peer has the memory leak.

From a quick look at boxo's code, I realize that dag.Get always creates sessions, I don't know if this is the expected behavior but from the code elegance point NewSession is more clear. Does anyone understand this phenomenon?