ipfs / helia

An implementation of IPFS in JavaScript
https://helia.io
Other
916 stars 95 forks source link

fix: content-type response header hints how to process response #426

Closed achingbrain closed 7 months ago

achingbrain commented 7 months ago

Description

DAG-JSON and DAG-CBOR can have embeded CIDs which are deserialized to CID objects by the @ipld/dag-json and @ipld/dag-cbor modules.

This fixes a bug whereby we were JSON.stringifying them which lead to rubbish being returned from .json().

Notes & open questions

This overwrites the .json method on the returned Response object. Is that bad?

Change checklist

achingbrain commented 7 months ago

I strongly believe we should not leak custom types onto object returned by .json()

I agree, though we now have a weird DX niggle where these return incompatible types even though you are interacting with the same ecosystem of modules:

import { verifiedFetch } from '@helia/verified-fetch'
import { dagJson } from '@helia/dag-json'

const cid = CID.parse('QmDAG-JSONwithCID')

// obj has CID as `{ "/": "QmFoo" }`
const res = await verifiedFetch(`ipfs://${cid}`)
const obj = await res.json()

// obj has CID as object
const d = dagJson(helia)
const obj = await d.get(CID.parse('QmDAG-JSONwithCID')

DAG-CBOR is more problematic when trying to treat is as plain-old-JSON because it supports several unserializable types:

const obj = {
  hello: 'world',
  bigInt: 12345n,
  bytes: Uint8Array.from([0, 1, 2, 3, 4])
}

const d = dagCbor(helia)
const cid = await d.add(obj)

const res = await verifiedFetch(`ipfs://${cid}`)
const output = await res.json() // 💥

I guess the thing to do here is to convert CIDs to { "/": "Qmfoo"}, BigInts to strings, Uint8Array to plain arrays, and so on - then .json() would work.

We could add a .object() method to the response type that returns rich object types?

It wouldn't survive caching, but we can't cache ipfs://QmFoo URLs without browser support anyway.

It'd be nice to have arrayBuffer() return the raw CBOR, then the user could pass it to @ipld/dag-cbor to get a rich object back, but we can't have that and have .json() return an object as for that to work the response body has to be set as a JSON string so arrayBuffer() would return a array buffer containing that JSON string and not CBOR - at that point all the CBOR codepoints that delimit types like Uint8Array etc will have been lost so there's no way to convert it back to the original object.

SgtPooki commented 7 months ago

It wouldn't survive caching, but we can't cache ipfs://QmFoo URLs without browser support anyway.

With verified-fetch in a SW (helia-service-worker-gateway), the returned responses are cached because its the Response object itself that's cached by the service-worker, based on requested URL. I'm not 100% sure of the intricacies there, but the returned object will be whatever is in the body i'm sure, so we need to make sure the stream/arrayBuffer given to the Response constructor is what we're depending on being cached.

This also brings up a few targeted use cases:

  1. applications that expect the browser to cache the response object of verified-fetch (SW & other gateway-esque solutions)
  2. applications that expect an easy fetch interface and subsequent consumption of the returned data.

It'd be nice to have arrayBuffer() return the raw CBOR, then the user could pass it to @ipld/dag-cbor to get a rich object back, but we can't have that and have .json() return an object as for that to work the response body has to be set as a JSON string so arrayBuffer() would return a array buffer containing that JSON string and not CBOR - at that point all the CBOR codepoints that delimit types like Uint8Array etc will have been lost so there's no way to convert it back to the original object.

I think having .arrayBuffer() for dagCbor be the priority makes the most sense, given that the data in the response should be unique per requested URL, and consistent. If someone calls .json on dagCbor, it should fail, right? Afterall, it's not actual JSON â„¢. If we need a codec that is actual JSON, not some psuedo-json-like-object, we should have a different codec, shouldn't we?

Maybe we could provide some interface for verifiedFetch similar to hashers for Helia, where we can extend how we process returned content? I imagine handling all of the possible scenarios could easily become painful.

lidel commented 7 months ago

I think getting DAG-CBOR experience with .json() and .arrayBuffer() is extra important, because it is a common pattern for people. We've been telling devs to build apps that use a functional equivalent of ipfs dag put for a long time. By default, the command turns JSON into CBOR: accepts json string, parses it as dag-json, and stores it as dag-cbor:

$ echo -n '{"hello":"world"}' | ipfs dag put # [--store-codec=dag-cbor] [--input-codec=dag-json]
bafyreidykglsfhoixmivffc5uwhcgshx4j465xwqntbmu43nb2dzqwfvae

So if someone builds an app, and wants to read that dag-cbor back with verified-fetch, they would want .json() to return JSON similar to what was passed to ipfs dag put:

> resp.json()
{"hello":"world"}

For practical purposes the roundtrip (json→cbor→json) works fine, as long it is limited to the safe DAG- subset of CBOR and JSON, listed in

I think if we have to choose between .arrayBuffer() and .json() for a DAG-CBOR CID, having JSON result will be more useful to devs.

@achingbrain Is my assumption correct that if someone wants to get raw dag-cbor bytes to parse with @ipld/dag-cbor, we could implement verified-fetch so they can make an explicit request with Accept header set to application/vnd.ipld.dag-cbor (or application/vnd.ipld.raw), and then .arrayBuffer() would work fine and return CBOR bytes, ano not JSON string, right?

achingbrain commented 7 months ago

Yes, they can request the raw block and decode it themselves, then they can use the rich types and we don't have to worry about breaking .json().


Notes from the dApps working group call:

We can try to deserialize the CBOR as JSON and error if it fails.

The user can then re-request with the header Accept: application/vnd.ipld.raw to get raw bytes and use @ipld/dag-cbor to decode the .arrayBuffer() output.

Presumably if they request with Accept: application/vnd.ipld.dag-cbor, application/vnd.ipld.raw (e.g. both types) we'll try to decode and if it fails, fall back to letting them use .arrayBuffer().

achingbrain commented 7 months ago

That said, what if we just try to decode and just let the (familiar) content type of the response dictate how it can be consumed, to save the user making two requests and so they don't have to care about application/vnd.ipld.whatnow?

If the block decodes as JSON-compliant DAG-CBOR it's application/json, otherwise it's application/octet-stream.

import { verifiedFetch } from '@helia/verified-fetch'
import * as dagCbor from '@ipld/dag-cbor'

const response = await verifiedFetch('ipfs://bafyDagCbor')

let obj

if (response.headers.get('Content-Type')?.includes('application/json')) {
  obj = await response.json()
} else {
  obj = dagCbor.decode(await response.arrayBuffer())
}

vs

import { verifiedFetch } from '@helia/verified-fetch'
import * as dagCbor from '@ipld/dag-cbor'

let response = await verifiedFetch('ipfs://bafyDagCbor')
let obj

try {
  obj = await response.json() // 💥
} catch (err) {
  if (!err.message.includes('some JSON error') {
    throw err
  }

  // try again
  response = await verifiedFetch('ipfs://bafyDagCbor', {
    headers: {
      accept: 'application/vnd.ipld.raw'
    }
  })

  try {
    obj = dagCbor.decode(await response.arrayBuffer())
  }  catch (err) {
    console.error('giving up', err)
    throw err
  }
}

I think I prefer the version without exceptions-as-control-flow.

achingbrain commented 7 months ago

Ok, I think we're in a good place now.

JSON codec

import { verifiedFetch } from '@helia/verified-fetch'
import * as json from 'multiformats/codecs/json'

const res = await verifiedFetch('ipfs://bafyJSONCodec')

// get object representation of JSON body
console.info(await res.json()) // { ... }

// alternative way to do the same thing
const obj = json.decode(new Uint8Array(await res.arrayBuffer()))
console.info(obj) // { ... }

DAG-JSON codec

import { verifiedFetch } from '@helia/verified-fetch'
import * as dagJson from '@ipld/dag-json'

const res = await verifiedFetch('ipfs://bafyDAGJSONCodec')

// get spec-compliant plain-old-JSON object that happens to contain a CID
console.info(await res.json()) // { cid: { '/': 'Qmfoo' } }

// ...or use @ipld/dag-json to get rich-object version
const obj = dagJson.decode(new Uint8Array(await res.arrayBuffer()))
console.info(obj) // { cid: CID('Qmfoo') }

DAG-CBOR codec

import { verifiedFetch } from '@helia/verified-fetch'
import * as dagCbor from '@ipld/dag-cbor'

const res = await verifiedFetch('ipfs://bafyDAGCBORCodec')

if (res.headers.get('content-type') === 'application/json') {
  // CBOR was JSON-friendly
  console.info(await res.json()) // { ... }
} else {
  // CBOR was not JSON-friendly, must use @ipld/dag-cbor to decode
  const obj = dagCbor.decode(new Uint8Array(await res.arrayBuffer()))
  console.info(obj) // { ... }
}

If the DAG-CBOR block contains anything that cannot round-trip to JSON (e.g. CIDs, Uint8Arrays, BigInts, etc), the content type will be application/octet-stream - .json() will throw and so @ipld/dag-cbor must be used to decode the resolved value of .arrayBuffer().


I've added README docs explaining the above, please comment if they muddy the waters.

2color commented 7 months ago

I opened a separate PR with some small docs refinements https://github.com/ipfs/helia/pull/436

The one challenge here, which isn't a blocker to me, is that for projects that use dag-cbor extensively, they might have some CIDs that can roundtrip to json and some that don't (with BigInts and Uint8Arrays). In its current state, there's no way to opt out of the automatic JSON parsing, so depending on their data, they either have to pay the cost of attempting to parse as json or deal with the difference cases (json/binary) in their code.

achingbrain commented 7 months ago

In its current state, there's no way to opt out of the automatic JSON parsing

If they know it's going to fail they can add an Accept: application/vnd.ipld.raw to the request and it'll just be a raw block with the content type application/octet-stream which they can await res.arrayBuffer() into @ipld/dag-cbor?

E.g.:

import { verifiedFetch } from '@helia/verified-fetch'
import * as dagCbor from '@ipld/dag-cbor'

const res = await verifiedFetch('ipfs://bafyDAGCBORCodec', {
  headers: {
    accept: 'application/vnd.ipld.raw'
  }
})

const obj = dagCbor.decode(new Uint8Array(await res.arrayBuffer()))
console.info(obj) // { ... }
2color commented 7 months ago

If they know it's going to fail they can add an Accept: application/vnd.ipld.raw to the request and it'll just be a raw block with the content type application/octet-stream which they can await res.arrayBuffer() into @ipld/dag-cbor?

From a look at the code, this isn't implemented and the example you provided returns a response object with status 501.

https://github.com/ipfs/helia/blob/f2c0f35ad32f6705a19b82479d4dbc72c9465858/packages/verified-fetch/test/verified-fetch.spec.ts#L98-L113

Another thing which might be a bug is that we pass the options object to a method which expects the body as the first parameter:

https://github.com/ipfs/helia/blob/f2c0f35ad32f6705a19b82479d4dbc72c9465858/packages/verified-fetch/src/verified-fetch.ts#L68-L73

https://github.com/ipfs/helia/blob/f2c0f35ad32f6705a19b82479d4dbc72c9465858/packages/verified-fetch/src/verified-fetch.ts#L296

I also tried the example and it doesn't work

achingbrain commented 7 months ago

I'm going to merge this so we can proceed with the tests for the different data types in the examples.