ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.12k stars 3.01k forks source link

Kudo doesn't connect to peers with requested CID, can't finish transfer #9649

Open Rashkae2 opened 1 year ago

Rashkae2 commented 1 year ago

Checklist

Description

Assumptions (at my risk.). When a node has a large amount of data (over 30GB), reproviding all CIDs to DHT is not really feasible without using Accelerated DHT. Accelerated DHT works well, but brings it's own issues that might make it unsuitable in some scenario.

Alternatives to help make data available are changing Reprovider strategy to Roots, (which is what I tested with.), or using a script to publish your ROOT CIDs periodically.

It is easy to create a situation where a ipfs client (trying to download) is unable to find the peer with the data, even though the requested CID is on the DHT... (to replicate, I can start a tranfer, stop and restart both deamons so they are no longer peered.)

on the requesting node, use ipfs get x

The node did not need to connect to hosting node, because it was able to find x somewhere else, either its own cache, or another peer in the swarm. However, it is unable to find a subsequent CID and no data transfer happens

use ipfs bitswap stat, you can see a different CID in the wantlist.

Sometimes the transfer will eventually start. Very often, it will not. (If the get request includes a Path, CID/Somedirecotry it will almost always time out.)

I can fix this myself however, if I ipfs dht findpeers x It will list the peerID's where that CID is published on the DHT, including the node I'm hosting the data. I can then ipfs swarm connect /p2p/peerid and the data transfer will immediately start.

My proposal is that IPFS should be doing this by itself. If the client is unable to find the wanted CID in it's peers and has to query DHT, it should always search for and connect to peers with the original CID that was in the get command, (as well as the wantlist.)

Jorropo commented 1 year ago

@Rashkae2 that an awesome analysis, you are completely correct, I already know about this bug (github search sucks and I was not able to find the issue). Sadly fixing this in bitswap is either a huge or really ugly refactor because bitswap doesn't think about the link between blocks, the simplest thing would be to add some "Root" field to session CIDs which we would fallback if the DHT search for the leaves CIDs yields no result. But that ugly because we are now tying sessions to some graph, instead of being a loose "thoses blocks are related" metric.

My new RAPIDE client have an internal representation that knows about the DAG and thus it will be trivial to implement backtracking of searches when I add support for routers (currently RAPIDE is in alpha and lacks many critical features like bitswap or DHT support).

Rashkae2 commented 1 year ago

Does it need to be fixed in bitswap though? If that's too ugly, maybe that should just be automatic as soon a request is input, (ie, a thread unreleated to bitswap automatcially seaches for and connects to peers who are advertising the CID). In theory, I can just write my own wrapper for ipfs get opperation... but my real 'goal' would be for this to work from the http gateway.

Jorropo commented 1 year ago

@Rashkae2 duplicating the code that do content discovery is also bad.

Given that I'm working on a replacement to the go-bitswap client I don't really want to spend more time on go-bitswap.

If you want to send a pull request I'll review it, the way I would go about it (if you find a better way feel free to try):

// 1. https://pkg.go.dev/github.com/ipfs/go-libipfs/bitswap/client/internal/session#Session
// add a new private field root cid,Cid to the bitswap session

// 2. https://pkg.go.dev/github.com/ipfs/go-libipfs@v0.5.0/bitswap/client/internal/providerquerymanager
// find a way for the provider query manager to access the root object from the session and if the first search (the one searching the leaves) timeouts and if the root field is set (cid.Cid.Defined allows the check) restart a new search on the root

// 3. https://pkg.go.dev/github.com/ipfs/go-libipfs/bitswap/client#Client
// Add a new method which accepts a cid as a root (internally this return you the same Session object that NewSession returns you, but with some private root field set)
func (*Client) NewSessionWithRoot(ctx context.Context, root cid.Cid) exchange.Fetcher

// 4. https://pkg.go.dev/github.com/ipfs/go-ipfs-exchange-interface#Interface
// Add the same NewSessionWithRoot to the SessionExchange interface
type SessionExchange interface {
    Interface
    NewSession(context.Context) Fetcher
    NewSessionWithRoot(ctx context.Context, root cid.Cid) Fetcher
}

// 5. https://pkg.go.dev/github.com/ipfs/go-blockservice#NewSession
// Implements NewSessionWithRoot with the blockservice (it don't need to retain the root, just forward it the exchange (here bitswap)).
func NewSessionWithRoot(ctx context.Context, bs BlockService, root cid.Cid) Fetcher

// 6. Edit the gateway code to use NewSessionWithRoot instead of NewSession where it makes sense.
Rashkae2 commented 1 year ago

I've afraid you've gone above my pay grade. All I can say is that I've been periodically testing IPFS since go-ipfs .10, and always end up at this same place before chucking the vm's. Drilling down to exactly where the process breaks down is the best I can offer.