libp2p / js-libp2p

The JavaScript Implementation of libp2p networking stack.
https://libp2p.io
Other
2.27k stars 436 forks source link

Unable to run a DHT Provide Successfully #2517

Closed jtsmedley closed 1 month ago

jtsmedley commented 2 months ago

Severity: High

Description:

  1. Attempted to provide a file to the DHT using Helia and found that I could not get a successful provide to the network
    • Always crashes with the code ERR_QUERY_ABORTED.
  2. Next I attempted to use the js-libp2p package directly and I got the same error along with no successful provides to the DHT.

Steps to reproduce the error: Here is an example repository that reproduces the ERR_QUERY_ABORTED error that I am seeing.

dhuseby commented 2 months ago

Related PRs:

achingbrain commented 1 month ago

I've updated the script a bit to print more stats and show the effect of a deeper routing table over time:

import { createLibp2p } from 'libp2p';
import { createFromJSON } from "@libp2p/peer-id-factory";
import { CID } from "multiformats";
import { LevelDatastore } from 'datastore-level';
import * as libp2pInfo from 'libp2p/version';
import * as fs from "node:fs";
import { toString as uint8ArrayToString } from 'uint8arrays/to-string'
import * as crypto from "node:crypto"
import * as raw from 'multiformats/codecs/raw'
import * as Digest from 'multiformats/hashes/digest'
import { sha256 } from 'multiformats/hashes/sha2'
// Transport
import { webRTC, webRTCDirect } from '@libp2p/webrtc';
import { webSockets } from "@libp2p/websockets";
import { tcp } from "@libp2p/tcp";
// Encryption
import { noise } from '@chainsafe/libp2p-noise';
import { yamux } from '@chainsafe/libp2p-yamux';
import { mplex } from '@libp2p/mplex';
// Peer Discovery
import { bootstrap } from '@libp2p/bootstrap';
// Services
import { identify } from '@libp2p/identify'
import { kadDHT, removePrivateAddressesMapper } from '@libp2p/kad-dht'
import delay from 'delay'

async function getNode (type) {
    const peerIdFile = `${type}.peer`
    const datastoreDir = `${type}.db`

    let peerId

    if (fs.existsSync(peerIdFile)) {
        peerId = await createFromJSON(JSON.parse(fs.readFileSync(peerIdFile)));
    }

    const datastore = new LevelDatastore(datastoreDir)
    await datastore.open()

    const node = await createLibp2p({
        peerId,
        addresses: {
            listen: [
                '/ip4/0.0.0.0/tcp/0'
            ],
            announce: [
                '/dns4/example.com/tcp/1234'
            ]
        },
        transports: [tcp(), webSockets(), webRTC(), webRTCDirect()],
        peerDiscovery: [
            bootstrap({
                list: [
                    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
                    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
                    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
                    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt",
                    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ"
                ],
            }),
        ],
        services: {
            dht: kadDHT({
                protocol: '/ipfs/kad/1.0.0',
                peerInfoMapper: removePrivateAddressesMapper
            }),
            identify: identify({
                agentVersion: `cre-24/1 ${libp2pInfo.name}/${libp2pInfo.version} UserAgent=${globalThis.process.version}`
            })
        },
        connectionEncryption: [noise()],
        streamMuxers: [yamux(), mplex()]
    });

    // write peer id out for reuse on next run
    fs.writeFileSync(peerIdFile, JSON.stringify({
        id: uint8ArrayToString(node.peerId.toBytes(), 'base58btc'),
        privKey: uint8ArrayToString(node.peerId.privateKey, 'base64pad'),
        pubKey: uint8ArrayToString(node.peerId.publicKey, 'base64pad'),
    }, null, 2))

    return node
}

const [
    publisherNode,
    resolverNode
] = await Promise.all([
    getNode('publisher'),
    getNode('resolver')
])

async function provide (cid) {
    const start = Date.now()
    const providers = []

    console.info('start provide of', cid.toString())
    printTableStats('publisher', publisherNode)

    try {
        await publisherNode.contentRouting.provide(cid, {
            onProgress: evt => {
                if (evt.detail.name === 'FINAL_PEER') {
                    console.info(`published provider record to ${evt.detail.peer.id} after ${Date.now() - start}ms`)

                    providers.push(evt.detail.peer.id.toString())
                }

                // uncomment to see all query steps
                //console.info('publish', evt.type, evt.detail)
            }
        })

        console.info(`stored provider records with ${providers.length} peers in ${Date.now() - start}ms`)
    } catch (err) {
        console.info(`provide failed after ${Date.now() - start}ms with message:`, err.message)
        throw err
    }

    printTableStats('publisher', publisherNode)
}

async function resolve (cid) {
    const start = Date.now()

    console.info('start resolve of', cid.toString())
    printTableStats('resolver', resolverNode)

    try {
        const providers = []
        let firstProvider

        for await (const provider of resolverNode.contentRouting.findProviders(cid, {
            signal: AbortSignal.timeout(120000)
        })) {
            console.info(`found provider ${provider.id} after ${Date.now() - start}ms`)
            providers.push(provider)

            if (firstProvider == null) {
                firstProvider = Date.now() - start
            }
        }

        console.info(`found ${providers.length} providers in ${Date.now() - start}ms, first provider found in ${firstProvider}ms`)
    } catch (err) {
        console.info(`finding providers failed after ${Date.now() - start}ms with message:`, err.message)
    }

    printTableStats('resolver', resolverNode)
}

while (true) {
    const cid = CID.createV1(raw.code, Digest.create(sha256.code, crypto.randomBytes(32)))
    await provide(cid)
    console.info('------')
    await resolve(cid)
    console.info('------')

    console.info('wait before starting provide')
    // wait 5s before providing again
    await delay(5000)
}

function printTableStats (type, node) {
    let size = 0
    let buckets = 0
    let maxDepth = 0

    function count (bucket, prefix = '') {
      prefix += bucket.prefix

      if (bucket.depth > maxDepth) {
        maxDepth = bucket.depth
      }

      if (bucket.peers != null) {
        buckets++
        size += bucket.peers.length
        return
      }

      count(bucket.left, prefix)
      count(bucket.right, prefix)
    }

    count(node.services.dht.routingTable.kb.root)

    console.info(type, 'routing table size', size, 'buckets', buckets, 'average occupancy', Math.round(size / buckets), 'max depth', maxDepth)
}
{
  "name": "content-routing-example",
  "version": "1.0.0",
  "type": "module",
  "description": "",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "dependencies": {
    "@chainsafe/libp2p-noise": "^15.0.0",
    "@chainsafe/libp2p-yamux": "^6.0.2",
    "@helia/delegated-routing-v1-http-api-client": "^3.0.1",
    "@libp2p/autonat": "next",
    "@libp2p/bootstrap": "next",
    "@libp2p/circuit-relay-v2": "next",
    "@libp2p/dcutr": "next",
    "@libp2p/identify": "next",
    "@libp2p/kad-dht": "next",
    "@libp2p/mplex": "next",
    "@libp2p/peer-id-factory": "next",
    "@libp2p/ping": "next",
    "@libp2p/tcp": "next",
    "@libp2p/upnp-nat": "next",
    "@libp2p/webrtc": "next",
    "@libp2p/websockets": "next",
    "datastore-level": "^10.1.8",
    "libp2p": "next",
    "multiformats": "^13.1.0"
  }
}

The changes I've made to the repro are:

  1. Start two nodes, a publisher and a resolver.
    • They are not connected so will work independently using the public network
  2. The publisher publishes provider records for a random CID
    • This means we will be requesting a different set of peers from the routing table each time
  3. The resolver then tries to resolve provider records for the CID
    • Again, because the CID is random, we will contact different peers each time
  4. Print routing table stats after each operation
  5. Goto 2

What I see is:

  1. The first provide starts with an empty routing table, so can take a minute or so to complete
  2. Subsequent provides depend on the diversity of the routing table
    • E.g. if it's well populated with peers KAD-close to the random CID that's being published it's fast, otherwise it takes longer
  3. Over time the provide times settle down as the routing table becomes more diverse (e.g. has a wider spread of KAD-ID values)

It starts with a provide that takes about a minute, then it starts to speed up but there's still the odd outlier with 30-40 second publish times. Once there's 10k+ or so peers in the routing table publish times can be under a second but are mostly in the range of 3-10 seconds.

@jtsmedley can you please try the above and report back if you see the same results?

jtsmedley commented 1 month ago
achingbrain commented 1 month ago

Support for WebSockets on the network is very sparse so if that's the only transport you're using it's likely the query isn't running to completion.

When you are seeing a 30s publish, what is the size of the routing table?

Typically you'll see the query time come down as the size of the table goes up because you have to make fewer network hops to find the closest peers to a given key.

To get a decently diverse routing table can take 20-30 minutes, though if you're using a datastore that persists between restarts most of the routing table should be restored after a node restart.

github-actions[bot] commented 1 month ago

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.

github-actions[bot] commented 1 month ago

This issue was closed because it is missing author input.