ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.18k stars 3.01k forks source link

Gateway directory listings should be paginated #8455

Open guseggert opened 3 years ago

guseggert commented 3 years ago

Checklist

Description

To render the directory listing page, go-ipfs sequentially fetches blocks for every directory entry. For large directories, this takes a very long time (see e.g. https://github.com/ipfs/go-ipfs/issues/7588). The gateway should paginate this listing so that there's a reasonable upper bound on the time it takes to return a response, and to allow more even load distribution across gateway fleets. I'd suggest some query args to control page size and offset, with an upper bound of 20 on the page size (this upper bound could be configurable).

lidel commented 3 years ago

This would require a serious refactor of how https://github.com/ipfs/dir-index-html works (which we already want to do, but is a bigger adventure).

Given that we want to improve IPLD support on gateways (https://github.com/ipfs/in-web-browsers/issues/182), we should do this type of thing in a generic way that works for all DAG types, and lazy-load additional Size and Type information when the DAG is unixfs.

I envision replacing unixfs-specific dir-index-html with "IPLD Explorer v2" that shows generic DAG view by default, but has specially-crafted variants for most popular codecs like dag-pb (unixfs) and leverages something like ?format=unixfs-info (https://github.com/ipfs/go-ipfs/issues/8234) for lazy-loading additional metadata about Size and Type only for items visible on the page.

guseggert commented 3 years ago

I agree with the bigger picture, but this also seems relatively low effort and addresses an availability risk. Assuming the "quick fix" is straightforward, I think it makes sense to do both (quick fix now, generic fix later).

We should also add some metrics around this, because from what I can tell, we don't have good visiblity into how much this contributes to aggregate metrics like latency, TTFB, etc.

Stebalien commented 3 years ago

If we're just trying to make non-sharded directories faster, https://github.com/ipfs/go-ipfs/issues/8178 is probably a simpler short-term solution.

Eventually, we'll likely need pagination for sharded directories. But we'll need to add the ability to "seek" which will require some design work.

RangerMauve commented 2 years ago

Some stuff discussed with Lidel that might be useful for consideration:

The pagination should try to use regular traversal and account for whatever ADLs exist at that point including HAMTs.

lidel commented 2 years ago

I also wrote some notes about an alternative approach, which essentially removes the need for pagination: https://github.com/ipfs/go-ipfs/issues/9058 and included both in HTML Gateway specs under best practices section (https://github.com/ipfs/specs/commit/9fc9a9c72fe538ab90b039da5c4025c368e300ba)