ipfs / helia

An implementation of IPFS in JavaScript
https://helia.io
Other
916 stars 99 forks source link

Pinning and Garbage Collection #28

Closed achingbrain closed 1 year ago

achingbrain commented 1 year ago

Context

Garbage collection in js-ipfs originally followed the go-ipfs model whereby pins were stored in a big DAG that was traversed to work out which blocks could be deleted and which couldn't while running garbage collection.

https://github.com/ipfs/js-ipfs/pull/2771 changed that to store the pins in the datastore instead of a big DAG which yielded a massive speed up when adding new pins, but garbage collection was still slow because the algorithm has to walk every dag that's pinned to build up a list of blocks in those dags.

Helia gives us an amazing opportunity to solve that slow garbage collection problem, this would be incredibly valuable to pinning services, for example, who typically don't garbage collect anything as their blockstores are so large the time it takes to run gc makes it impractical to do so.

Gotchas

Interface

An interface to the pinning system might look like this (somewhat similar to js-ipfs):

import { CID } from 'multiformats/cid'
import type { AbortOptions } from '@libp2p/interfaces'

enum PinStatus {
  /**
   * All blocks in the pin have been stored in the blockstore
   */
  pinned = 'pinned',

  /**
   * The pin is being created, blocks in the DAG are still being fetched from
   * the network
   */
  pending = 'pending',

  /**
   * Not all blocks could be fetched from the network - this is usually because
   * abort signal passed into the `pin.add` operation emitted it's `abort` event.
   */
  failed = 'failed'
}

interface AddOptions extends AbortOptions {
  /**
   * When pinning a DAG, Helia will ensure that all blocks in the DAG are present in
   * the blockstore which may involve network operations. By default Helia will traverse
   * the entire DAG but pass a depth here to limit that behaviour.
   */
  depth?: number

  /**
   * A user-chosen name for the pin
   */
  name?: string

  /**
   * User-specific metadata for the pin
   */
  metadata?: Record<string, string | number | boolean>

  /**
   * Receives progress events
   */
  progress?: (evt: Event) => void
}

interface RmOptions extends AbortOptions {
  /**
   * Receives progress events
   */
  progress?: (evt: Event) => void
}

interface LsOptions extends AbortOptions {
  type?: PinType
}

interface Pin {
  /**
   * The current status of the pin
   */
  status: PinStatus

  /**
   * The pinned CID
   */
  cid: CID

  /**
   * The pin name
   */
  name?: string

  /**
   * `Infinity` for a recursive pin, 1 for a direct pin or an arbitrary number
   */
  depth: number

  /**
   * User-specific metadata for the pin
   */
  metadata: Record<string, string | number | boolean>
}

interface Pinning {
  /**
   * Pin the block that corresponds to the passed CID. If the DAG in the pinned block
   * contains CIDs, the blocks corresponding to those CIDs will also be pinned.  Pass
   * `{ direct: true }` to only pin the top level block.
   */
  add: (cid: CID, opts?: AddOptions) => Promise<void>

  /**
   * Unpin the block that corresponds to the passed CID. If the DAG in the pinned block
   * contains CIDs, the blocks corresponding to those CIDs will also be unpinned.  Pass
   * `{ direct: true }` to only unpin the top level block.
   */
  rm: (cid: CID, opts?: RmOptions) => Promise<void>

  /**
   * List all pins stored by this node
   */
  ls: (opts?: LsOptions) => AsyncGenerator<Pin>
}

interface GCOptions {
  /**
   * Receives progress events
   */
  progress: (evt: Event) => void
}

interface Helia {
  // ...other methods here...

  /**
   * Run garbage collection on this node - any blocks that are not pinned will be deleted
   */
  gc: (opts?: GCOptions) => Promise<void>

  /**
   * The pinning API
   */
  pin: Pinning
}

Strategies

Some benchmarking will be required to choose the appropriate pinning strategy. These should store several 100k of pins of varying depths before running gc.

Classic

Reference counting

Something else?

We are open to suggestions, but all implementations should be benchmarked.

github-actions[bot] commented 1 year ago

:tada: This issue has been resolved in version @helia/interface-v1.0.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket:

github-actions[bot] commented 1 year ago

:tada: This issue has been resolved in version helia-v1.0.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket: