ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.17k stars 3.01k forks source link

Tracking issue for UnixFS automatic sharding #8106

Closed aschmahmann closed 2 years ago

aschmahmann commented 3 years ago

Update 10/21: All major work has been finished. Full review pending.

The sharding PR has landed and we're working on the reverse transition of "unsharding". We haven't merged it in go-ipfs (https://github.com/ipfs/go-ipfs/pull/8114) until we finish with the unsharding to have the whole functionality together (and ensure deterministic CIDs for any directory size).

Main unsharding work happening in https://github.com/ipfs/go-unixfs/pull/94.

Minor issues left for last compiled in https://github.com/ipfs/go-unixfs/issues/105 can be addressed after main PRs land.

Potential issue with merkledag.Walk function doing a BFS in https://github.com/ipfs/boxo/issues/392, but this can be address after landing the major work as it's only an optimization.


Tracking implementation of https://github.com/ipfs/go-ipfs/issues/7022

Per the linked issue in order to enable sharded directories by default we want to shard the directories when required.

After some discussion the heuristic we are going to use for now to do determine when to use sharded vs regular directories is option three listed here. That is we will sum the sizes of all the names + CIDs in the map and if they are >256KiB we will use sharded directories and if they are <=256KiB we will use regular directories.

Places where we'll likely have to look at changes include:

Hopefully if we keep the interfaces the same here we won't have to make any changes in go-ipfs itself. We may need to do a scan to see if we do any type casting to a UnixFS basic or sharded directory though just in case.

Once this is done we should be able to drop the global boolean that enables "use sharded directories" from both go-unixfs, go-ipfs, and go-ipfs-config.

schomatis commented 3 years ago

Scoped issue with technical details in https://github.com/ipfs/go-mfs/issues/87 regarding the MFS/UnixFS enhancement to support this. Once that is done go-ipfs should only need to update the dependencies and set the new option with the desired 256KiB value.

Stebalien commented 3 years ago

Now that auto-sharding is implemented, let's try implement auto-unsharding. As far as I can tell, this shouldn't be too difficult and shouldn't be a massive performance problem (TBD). The tricky parts are:

  1. We don't want to do any size estimation unless we actually delete something.
  2. Ideally, we'd only do size estimations when we serialize.

Potential solution:

  1. When making changes in a sharded directory, keep track of the net size change.
  2. On serialization, if the net size change is negative, enumerate until we hit the limit.

Additional notes (possible future extensions):

schomatis commented 3 years ago

We still need to integrate this into go-ipfs (draft in https://github.com/ipfs/go-ipfs/pull/8114), namely fixing sharness and interop tests.

BigLep commented 3 years ago

2021-05-10 discussion:

  1. @schomatis is going to do the unsharding work week of 2021-05-17
  2. @mburns is going to do a test fix here. @schomatis or @aschmahmann can provide the details.
schomatis commented 3 years ago

More details in the TODO list of the ongoing PR https://github.com/ipfs/go-ipfs/pull/8114.

schomatis commented 3 years ago

Update 8/13 (@schomatis): I'm leading this effort currently in progress. Not blocked.

schomatis commented 3 years ago

Update 8/17 (@schomatis, DRI):

Brief:

schomatis commented 3 years ago

Update 8/17 (corrected) (@schomatis, DRI):

Brief:

schomatis commented 3 years ago

Update 8/23 (@schomatis, DRI):

Brief:

(See full status in the OP.)

schomatis commented 3 years ago

Update 8/27 (@schomatis, DRI):

Brief:

(See full status in PR description.)

schomatis commented 3 years ago

Update 9/3 (@schomatis, DRI):

Brief:

(See full status in PR description.)

schomatis commented 3 years ago

Update 10/21: All major work has been finished. Full review pending.

The sharding PR has landed and we're working on the reverse transition of "unsharding". We haven't merged it in go-ipfs (https://github.com/ipfs/go-ipfs/pull/8114) until we finish with the unsharding to have the whole functionality together (and ensure deterministic CIDs for any directory size).

Main unsharding work happening in https://github.com/ipfs/go-unixfs/pull/94. Test for this are in a separate PR: https://github.com/ipfs/go-unixfs/pull/99

Minor issues left for last compiled in https://github.com/ipfs/go-unixfs/issues/105 can be addressed after main PRs land.

Potential issue with merkledag.Walk function doing a BFS in https://github.com/ipfs/boxo/issues/392, but this can be address after landing the major work as it's only an optimization.

aschmahmann commented 2 years ago

Closed by #8563