Closed aschmahmann closed 2 years ago
Scoped issue with technical details in https://github.com/ipfs/go-mfs/issues/87 regarding the MFS/UnixFS enhancement to support this. Once that is done go-ipfs
should only need to update the dependencies and set the new option with the desired 256KiB value.
Now that auto-sharding is implemented, let's try implement auto-unsharding. As far as I can tell, this shouldn't be too difficult and shouldn't be a massive performance problem (TBD). The tricky parts are:
Potential solution:
Additional notes (possible future extensions):
EnumLinksAsync
to enumerate in parallel, but this may actually have worse performance because we might try sampling different parts of the graph each time (non-deterministic). It may actually be better to just enumerate links sequentially, especially because we likely only need to fetch maybe ~10 leafs (would need to be computed). Alternatively, it may be worth it to try to make EnumLinksAsync
more depth-first instead of breadth-first.We still need to integrate this into go-ipfs
(draft in https://github.com/ipfs/go-ipfs/pull/8114), namely fixing sharness and interop tests.
2021-05-10 discussion:
More details in the TODO list of the ongoing PR https://github.com/ipfs/go-ipfs/pull/8114.
Update 8/13 (@schomatis): I'm leading this effort currently in progress. Not blocked.
The sharding PR has landed and we're working on the reverse transition of "unsharding". We haven't merged that in go-ipfs (https://github.com/ipfs/go-ipfs/pull/8114) until we finish with the unsharding to have the whole functionality together (and ensure deterministic CIDs for any directory size).
Main unsharding work happening in https://github.com/ipfs/go-unixfs/pull/94. There are several reviews that I need to address next week.The PR is constantly being updated with the main TODO items.
The tests have been progressing in a separate PR in https://github.com/ipfs/go-unixfs/pull/99 which now have a preliminary code for review. After https://github.com/ipfs/go-unixfs/pull/94 is lands will re-focus here.
Update 8/17 (@schomatis, DRI):
Brief:
Update 8/17 (corrected) (@schomatis, DRI):
Brief:
Update 8/23 (@schomatis, DRI):
Brief:
(See full status in the OP.)
Update 8/27 (@schomatis, DRI):
Brief:
io/directory_test.go
as explained in PR description.)(See full status in PR description.)
Update 9/3 (@schomatis, DRI):
Brief:
merkledag.Walk
function doing a BFS: https://github.com/ipfs/boxo/issues/392.io/directory_test.go
as explained in PR description.)(See full status in PR description.)
Update 10/21: All major work has been finished. Full review pending.
The sharding PR has landed and we're working on the reverse transition of "unsharding". We haven't merged it in go-ipfs (https://github.com/ipfs/go-ipfs/pull/8114) until we finish with the unsharding to have the whole functionality together (and ensure deterministic CIDs for any directory size).
Main unsharding work happening in https://github.com/ipfs/go-unixfs/pull/94. Test for this are in a separate PR: https://github.com/ipfs/go-unixfs/pull/99
Minor issues left for last compiled in https://github.com/ipfs/go-unixfs/issues/105 can be addressed after main PRs land.
Potential issue with merkledag.Walk
function doing a BFS in https://github.com/ipfs/boxo/issues/392, but this can be address after landing the major work as it's only an optimization.
Closed by #8563
Update 10/21: All major work has been finished. Full review pending.
The sharding PR has landed and we're working on the reverse transition of "unsharding". We haven't merged it in go-ipfs (https://github.com/ipfs/go-ipfs/pull/8114) until we finish with the unsharding to have the whole functionality together (and ensure deterministic CIDs for any directory size).
Main unsharding work happening in https://github.com/ipfs/go-unixfs/pull/94.
Minor issues left for last compiled in https://github.com/ipfs/go-unixfs/issues/105 can be addressed after main PRs land.
Potential issue with
merkledag.Walk
function doing a BFS in https://github.com/ipfs/boxo/issues/392, but this can be address after landing the major work as it's only an optimization.Tracking implementation of https://github.com/ipfs/go-ipfs/issues/7022
Per the linked issue in order to enable sharded directories by default we want to shard the directories when required.
After some discussion the heuristic we are going to use for now to do determine when to use sharded vs regular directories is option three listed here. That is we will sum the sizes of all the names + CIDs in the map and if they are >256KiB we will use sharded directories and if they are <=256KiB we will use regular directories.
Places where we'll likely have to look at changes include:
Hopefully if we keep the interfaces the same here we won't have to make any changes in go-ipfs itself. We may need to do a scan to see if we do any type casting to a UnixFS basic or sharded directory though just in case.
Once this is done we should be able to drop the global boolean that enables "use sharded directories" from both go-unixfs, go-ipfs, and go-ipfs-config.