ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.03k stars 3k forks source link

Get some info about UnixFS objects on public IPFS HTTP API #8528

Open d70-t opened 2 years ago

d70-t commented 2 years ago

Checklist

Description

I am implementing a backend to access IPFS via the Python library fsspec at ipfsspec. To do so (and to save me from implementing the IPFS protocol in Python), the plan is to access UnixFS files and directories on IPFS via a HTTP gateway. An fsspec backend needs to implement a function info(path) which must return

To me, this seems to be a reasonable requirement for other generic filesystem abstractions as well, thus I assume that this feature request could be of broader interest.

While the /v0/files/stat provides this kind of information, this endpoint is often not reachable on public gateways.

Another option to obtain this information is to perform a HEAD request towards http://gateway/ipfs/CID, which in case of a file provides the size in the content-length header and which (seemingly) lets me discriminate between file and directory using the etag header. This method works on some public gateways, but scares me as well, as this doesn't seem to be the right use of observable API features.

I see three possible ways to obtain the desired functionality:

Tagging @whyrusleeping as I've been talking to him about this already on slack.

lidel commented 2 years ago

I understand you want to build something future-proof, and robust.

The long term direction is that we will be removing /api/v0 (subset of go-ipfs' RPC over HTTP, never designed to be exposed on the web) from public gateways and enhancing content paths at /ipfs/{cid} with necessary APIs.

Detecting a directory today (go-ipfs 0.10)

If you want to implement something against how go-ipfs gateways are today, your best option to detect a directory is sending HTTP HEAD. IF content-type is text/html AND Etag starts with DirIndex- then it is a directory listing. While it feels awkward, it is a robust and future-proof check: directory listings will always be returned as HTML by default, and response requires this custom Etag for cache control to avoid potentially mutable HTML being cached forever like we do with immutable files under /ipfs/.

Future

In the future, in addition to the Etag way, we most likely will have /ipfs/{cid}?format=dag-json which will return the dag-pb root block serialized into a deterministic JSON format that could be cached forever, and/or /ipfs/{cid}?format=unixfs-stats parameter which will have Type (dir/file).

We are already tracking ?format= in https://github.com/ipfs/go-ipfs/issues/8234, but let's keep this one open to ensure it includes the ability to get unixfs directories in more efficient manner.

Feature scope

MVP is to make it possible to send request to /ipfs/{cid}[?format] where CID is dag-pb (unixfs) and get:

lidel commented 2 years ago

Related proposal: add Ipfs-DagSize and Ipfs-DataSize to gateway responses. If someone needs this, please raise support in the linked issue, or propose IPIP against ipfs/specs repo.