IPIP: Data Onboarding via HTTP POST (and future ipfs:// POST|PUT)

lidel commented 2 years ago

Problem statement

HTTP Gateways are the most successful way for retrieving content-addressed data. Successful use of HTTP for retrieval use cases proves that IPFS does not replace HTTP, but augment it by providing variability and resiliency. IPFS over HTTP brings more value than the sum of its parts.

Removing the need for implementation specific RPC APIs (like one in Kubo) allowed not only faster adoption of CIDs on the web, but enabled alternative implementations of IPFS (like Iroh in Rust) to test compliance and benchmark thenselves against each other.

While we have HTTP Gateways as a standard HTTP-based answer to the retrieval of data stored with IPFS (including verifiable application/vnd.ipld.raw and application/vnd.ipld.car responses), the data onboarding over HTTP is currently done with vendor-specific APIs.

The status quo at 2023 Q1 is pretty bad from the end user/developer’s perspective: every IPFS implementation, including online services providing storage and pinning services, exposes custom opinionated HTTP API for onboarding data to IPFS.

Why we need IPIP for HTTP Data Onboarding

To illustrate, some prominent examples (2022 Q4):

Click to expand :see_no_evil:

- Implementations - Kubo RPC (AKA legacy /api/v0/..) - Is often used as a “standard HTTP API upload template” because it has commands for all onboarding needs: - [https://docs.ipfs.tech/reference/kubo/rpc/#api-v0-add](https://web.archive.org/web/20221201011916/https://docs.ipfs.tech/reference/kubo/rpc/#api-v0-add) – files and directories - FLAG: it uses custom form-data handling that requires special library for directory upload, which is an awful papercut for someone expecting simple upload with “curl” ([http://web.archive.org/web/20221201011916/https://docs.ipfs.tech/reference/kubo/rpc/#request-body](http://web.archive.org/web/20221201011916/https://docs.ipfs.tech/reference/kubo/rpc/#request-body)) - FLAG: Kubo RPC was never designed to be used in browser context, and there are known bugs around the way it handles uploads (example: [https://github.com/ipfs/kubo/issues/5168](https://github.com/ipfs/kubo/issues/5168)) - [https://docs.ipfs.tech/reference/kubo/rpc/#api-v0-block-put](https://docs.ipfs.tech/reference/kubo/rpc/#api-v0-block-put) – raw block - [https://docs.ipfs.tech/reference/kubo/rpc/#api-v0-dag-put](https://docs.ipfs.tech/reference/kubo/rpc/#api-v0-dag-put) – JSON-like documents and custom DAGs (DAG-JSON and DAG-CBOR) - [https://docs.ipfs.tech/reference/kubo/rpc/#api-v0-dag-import](https://docs.ipfs.tech/reference/kubo/rpc/#api-v0-dag-import) – arbitrary bags of blocks in CAR format - JS-IPFS - Reimplements most of the Kubo RPC and exposes it over HTTP,, but diverged long time ago and is not 1:1 - FLAG: In addition to HTTP, JS-IPFS exposes selected commands over gRPC-over-WebSockets, to work-around browser issues caused by Kubo RPC ([https://web.archive.org/web/20220528152743/https://github.com/ipfs/js-ipfs/tree/master/packages/ipfs-grpc-server#why](https://web.archive.org/web/20220528152743/https://github.com/ipfs/js-ipfs/tree/master/packages/ipfs-grpc-server#why)) - IPFS Cluster - Acts as a reverse proxy for Kubo RPC, but has own commands too and provides special behavior on top of what Kubo RPC does: - [https://web.archive.org/web/20220911053755/https://ipfscluster.io/documentation/reference/api/](http://web.archive.org/web/20220911053755/https://ipfscluster.io/documentation/reference/api/) – `/add` endpoint uses unixfs by default, but also accepts CARs when HTTP POST request is made with `?format=car` and it only accepts CARs with single root. - Online services - Pinata - [https://web.archive.org/web/20220930091452/https://docs.pinata.cloud/pinata-api/pinning/pin-file-or-directory](https://web.archive.org/web/20220930091452/https://docs.pinata.cloud/pinata-api/pinning/pin-file-or-directory) – onboarding file or directory - [https://web.archive.org/web/20220817122725/https://docs.pinata.cloud/pinata-api/pinning/pin-json](https://web.archive.org/web/20220817122725/https://docs.pinata.cloud/pinata-api/pinning/pin-json) – onboarding JSON document - web3storage - [http://web.archive.org/web/20220914153854/https://web3.storage/docs/reference/http-api/](http://web.archive.org/web/20220914153854/https://web3.storage/docs/reference/http-api/) – file and CAR uploads - note: no block API (impossible to import DAG-CBOR without the overhead of single-block-CAR for every CID) - Infura - [http://web.archive.org/web/20220429202905/https://docs.infura.io/infura/networks/ipfs/http-api-methods/add](http://web.archive.org/web/20220429202905/https://docs.infura.io/infura/networks/ipfs/http-api-methods/add) – file and directory import API that is carbon-copy of Kubo’s internal RPC API - [http://web.archive.org/web/20220429203039/https://docs.infura.io/infura/networks/ipfs/http-api-methods/block_put](http://web.archive.org/web/20220429203039/https://docs.infura.io/infura/networks/ipfs/http-api-methods/block_put) – raw block import that is carbon-copy of Kubo’s internal RPC API - note: no CAR import - TODO: source more examples

And the CAR upload API insanity corca 2024 Q1:

https://discuss.ipfs.tech/t/uploading-cars-and-user-generated-cids/17592

This state of things introduces an artificial barrier to adoption: the user needs to learn what APIs are available, and then “pick winners” – decide which implementations and services are the most future-proof. And even then, many choices are burdened by legacy of Kubo RPC and it’s degraded performance and DX/UX in web browsers.

Goal: create data onboarding protocol for both HTTP and native IPFS

The intention here is to create IPIP with a vendor-agnostic protocol for onboarding data that:

is easy to use and implement in HTTP (POST https://)
- does not require any libraries or documentation,
- and is as easy to work with from JS with fetch API as it is in the command-line with curl
follow the retrieval story, where ipfs:// behavior is analogous to subdomain gateways
- :point_right: what we want, is to have a protocol that can be represented as both POST https:// AND POST ipfs:// APIs

IPIP scope

We want two IPIPs: one for onboarding data with HTTP POST, and one for authoring (modifying/pathing) it with HTTP PUT. This allows us to ship most useful onboarding first, and then do authoring as an optional add-on, which services may support, but dont have to (if they are only onboarding to filecoin etc).

For now, focusing on the POST

POST Requests (Onboarding)

👉 This is the minimal scope we need to cover from the day one, ensuring every use case has a vendor-agnostic spec.

Delegated
- Single File (UnixFS) or single (DAG-)CBOR/JSON document
- Arbitrary Directory tree (UnixFS)
  - Option A: TAR stream
    - open question: how does this handle interrupted upload? can server tell some data is missing?
  - Option B: custom form-data? (think twice, we have lessons learned around RPC at /api/v0/add in Kubo)
Native
- Raw block
- CAR stream

The working code for this will be reference implementation that replaces/updates the legacy Gateway.Writable feature in Kubo with the above feature set.

PUT/PATCH/DELETE Requests (Authoring)

This will be a separate IPIP, but flagging this as long term plans that should feel idiomatic too.

TBD: Delegated vs Native
Critical: ensure no surprises, UX/DX is paramount. Needs research and analysis.
- One idea is to keep it limited to patching UnixFS paths and DAG-JSON/CBOR documents.
- Other idea is to have syntax parity with JSON-based IPLD Path and have the same JSON syntax as dag diff and dag patch commands.

References

Revisit the concept of Writable Gateways
https://github.com/ipfs/go-ipfs/blob/master/docs/config.md#gatewaywritable
https://discuss.ipfs.io/t/writeable-http-gateways/210
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Location#pointing_to_a_new_document_http_201_created
WIP private IPIP draft: https://www.notion.so/protocollabs/wip-IPIP-Data-Onboarding-with-HTTP-POST-4c394b8ebb774f2d87d34466019257fc
Alex prototyped some REST APIs in https://github.com/ipfs/specs/pull/224/files (while this was intending to be update to Kubo RPC, the document includes some ideas around patching files and directories)
https://docs.api.video/vod/delegated-upload-tokens as prior art where opaque token can be used with standard tools like curl

lidel commented 2 years ago

@RangerMauve found some additional notes, these are bit old and may not hold truth anymore, but food for thought:

The difference between PUT and POST

Open question, but usual is around PUT being idempotent. What is the standard here?

Should calling it once or several times successively have the same effect (ipfs dag import places the same blocks in datastore), whereas successive identical POST requests have additional effects (e.g. default parameters of ipfs add may change and produce different CID, akin to placing an order several times. (distinction based on MDN docs)? Needs analysis

How could the `fetch` interaction look like?

Below are loose musings, use it only for an inspiration

Imports

HTTP POST /ipfs → imports and returns CID of posted file or CAR archive HTTP PUT /ipfs → imports CAR archive HTTP PUT /ipfs/{cid} → imports DAG archive and validates the dag behind CID is fully present in local datastore

DAG mutations

DELETE /ipfs/{cid}/foo/file → return CID of same tree as {cid} but without the file (or CBOR field) PUT /ipfs/{cid}/new-file → return CID of same tree as {cid} but with a new or replaced file (or CBOR reference) sent as CAR archive POST /ipfs/{cid}/new-file → return CID of same tree as {cid} but with a new or replaced file (or CBOR value/reference) sent as bytes / multiform?

Open questions

how gateway operations should be mapped to ipfs:// in Brave? we want to resuse as much as possible.
(should we?) how to expose MFS on gateway (/mfs/local?) and in Brave (ipns://local/?) doesn not need to be in MVP, but worth considering this while coming up with conventions

BigLep commented 1 year ago

Pasting in some notes that Kubo maintainers have had on this topic:

IPFS Ecosystem is lacking generic HTTP API for data ingestion.

This means every service invents their own, re-inventin the same thing (HTTP upload endpoint) over and over again, but with small changes which mean we are unable to write clients that are compatible with multiple services or IPFS implementations. This creates artificial barriers for adoption and vendor lock-in (people need to pick 1-3 “winners” and have to live with them as the switching cost is too high).

This is real problem – Brave is currently forced to look at proprietary CAR upload APIs from nft.storage, because we have no generic one.

We have vendor-agnostic API for data retrieval (HTTP gateway and block / car responses, or deserialized version for UnixFS). We need similar flexibility for ingestion: ability to upload a file, Tar stream and let gateway chunk and produce UnixFS, OR accept pre-chunked content addressed data as blocks and CARs.

RangerMauve commented 1 year ago

I've had great luck with the HTTP POST/PUT APIs I added into Agregore via extensions to Kubo's existing Writable API.

https://github.com/AgregoreWeb/agregore-ipfs-daemon/blob/main/spec.md

cc @fabrice who's been doing something similar in Iroh. https://github.com/n0-computer/iroh/pull/499

RangerMauve commented 1 year ago

Regarding form data, I've found it useful to be able to say "Upload these files to a single directory" and let the application deal with nested directories itself. It's not as powerful as the subdirecotory hack in kubo, but it's straightforward and integrates with existing tooling.

Winterhuman commented 1 year ago

Just to throw an idea in, any opinions on the viability of adding https://github.com/nwtgck/piping-server like functionality for cacheless gateways? The idea would be a provider keeps an HTTP connection with the gateway after an initial PUT/POST request, and can then stream the content over the gateway to clients without the gateway needing to store it

fizzl commented 1 year ago

I am evaluating different options for implementing an IPFS gateway for my own use. I noticed that Gateway.Writable is deprecated. Can I still expect the functionality to be there, until this new system is usable?

My needs are quite modest: I just need to be able to POST a single file at a time and get the new CID back.

My infra looks like this: [User] <-- 1 -- > [AWS API Gateway] <--2--> [NGINX] <-- 3 --> [IPFS Gateway]

User submits the file as POST request to API Gateway with an identity token.
API Gateway exchanges the token to a secret token, if valid
NGINX only accepts GET/POST from requests with the secret token
IPFS Gateway listens to localhost for NGINX.

So, my actual question would be: Should I implement my own hack for POST until such time that #375 is resolved, or can I trust the deprecated POST gateway be there in the meantime?

lidel commented 1 year ago

@Winterhuman this is offtopic / bigger scope than the API discussed here, but see "libp2p over HTTP" discussions linked from https://github.com/libp2p/specs/pull/477. The same service could expose bitswap or other data transfer protocol over that, in addition to regular Gateway, but these would be separate things and specs.

@fizzl you should not build things based on legacy implementation, timelines changed and we most likely remove the old "Writable Gateway" before new one is ready (https://github.com/ipfs/kubo/issues/9738). For new projects, write your own onboarding code, or use Kubo's /api/v0/add RPC instead.

Klexx commented 1 year ago

I was asked to write my use case for a writable gateway in here. I like the idea of applications like IPFessay that allow you to publish a Markdown document directly in the browser.

RangerMauve commented 1 year ago

I find put and post to be really useful in developing small applications in Agregore and we've got some example apps put together which folks can look at on our website. https://agregore.mauve.moe/

tionis commented 1 year ago

Since no one seems to have mentioned it so far: https://hardbin.com makes use of writable gateways heavily for its core functionality

ipfs / specs