Closed achingbrain closed 7 months ago
I strongly believe we should not leak custom types onto object returned by .json()
I agree, though we now have a weird DX niggle where these return incompatible types even though you are interacting with the same ecosystem of modules:
import { verifiedFetch } from '@helia/verified-fetch'
import { dagJson } from '@helia/dag-json'
const cid = CID.parse('QmDAG-JSONwithCID')
// obj has CID as `{ "/": "QmFoo" }`
const res = await verifiedFetch(`ipfs://${cid}`)
const obj = await res.json()
// obj has CID as object
const d = dagJson(helia)
const obj = await d.get(CID.parse('QmDAG-JSONwithCID')
DAG-CBOR
is more problematic when trying to treat is as plain-old-JSON because it supports several unserializable types:
const obj = {
hello: 'world',
bigInt: 12345n,
bytes: Uint8Array.from([0, 1, 2, 3, 4])
}
const d = dagCbor(helia)
const cid = await d.add(obj)
const res = await verifiedFetch(`ipfs://${cid}`)
const output = await res.json() // 💥
I guess the thing to do here is to convert CID
s to { "/": "Qmfoo"}
, BigInt
s to strings, Uint8Array
to plain arrays, and so on - then .json()
would work.
We could add a .object()
method to the response type that returns rich object types?
It wouldn't survive caching, but we can't cache ipfs://QmFoo
URLs without browser support anyway.
It'd be nice to have arrayBuffer()
return the raw CBOR, then the user could pass it to @ipld/dag-cbor
to get a rich object back, but we can't have that and have .json()
return an object as for that to work the response body has to be set as a JSON string so arrayBuffer()
would return a array buffer containing that JSON string and not CBOR - at that point all the CBOR codepoints that delimit types like Uint8Array
etc will have been lost so there's no way to convert it back to the original object.
It wouldn't survive caching, but we can't cache ipfs://QmFoo URLs without browser support anyway.
With verified-fetch in a SW (helia-service-worker-gateway), the returned responses are cached because its the Response object itself that's cached by the service-worker, based on requested URL. I'm not 100% sure of the intricacies there, but the returned object will be whatever is in the body i'm sure, so we need to make sure the stream/arrayBuffer given to the Response
constructor is what we're depending on being cached.
This also brings up a few targeted use cases:
It'd be nice to have arrayBuffer() return the raw CBOR, then the user could pass it to @ipld/dag-cbor to get a rich object back, but we can't have that and have .json() return an object as for that to work the response body has to be set as a JSON string so arrayBuffer() would return a array buffer containing that JSON string and not CBOR - at that point all the CBOR codepoints that delimit types like Uint8Array etc will have been lost so there's no way to convert it back to the original object.
I think having .arrayBuffer()
for dagCbor be the priority makes the most sense, given that the data in the response should be unique per requested URL, and consistent. If someone calls .json
on dagCbor, it should fail, right? Afterall, it's not actual JSON â„¢. If we need a codec that is actual JSON, not some psuedo-json-like-object, we should have a different codec, shouldn't we?
Maybe we could provide some interface for verifiedFetch similar to hashers
for Helia, where we can extend how we process returned content? I imagine handling all of the possible scenarios could easily become painful.
I think getting DAG-CBOR experience with .json()
and .arrayBuffer()
is extra important, because it is a common pattern for people. We've been telling devs to build apps that use a functional equivalent of ipfs dag put
for a long time. By default, the command turns JSON into CBOR: accepts json string, parses it as dag-json, and stores it as dag-cbor:
$ echo -n '{"hello":"world"}' | ipfs dag put # [--store-codec=dag-cbor] [--input-codec=dag-json]
bafyreidykglsfhoixmivffc5uwhcgshx4j465xwqntbmu43nb2dzqwfvae
So if someone builds an app, and wants to read that dag-cbor back with verified-fetch, they would want .json()
to return JSON similar to what was passed to ipfs dag put
:
> resp.json()
{"hello":"world"}
For practical purposes the roundtrip (json→cbor→json) works fine, as long it is limited to the safe DAG- subset of CBOR and JSON, listed in
{"/": { "bytes": String /* Base64 encoded binary */ }}
I think if we have to choose between .arrayBuffer()
and .json()
for a DAG-CBOR CID, having JSON result will be more useful to devs.
@achingbrain Is my assumption correct that if someone wants to get raw dag-cbor
bytes to parse with @ipld/dag-cbor
, we could implement verified-fetch so they can make an explicit request with Accept
header set to application/vnd.ipld.dag-cbor
(or application/vnd.ipld.raw
), and then .arrayBuffer()
would work fine and return CBOR bytes, ano not JSON string, right?
Yes, they can request the raw block and decode it themselves, then they can use the rich types and we don't have to worry about breaking .json()
.
Notes from the dApps working group call:
We can try to deserialize the CBOR as JSON and error if it fails.
The user can then re-request with the header Accept: application/vnd.ipld.raw
to get raw bytes and use @ipld/dag-cbor
to decode the .arrayBuffer()
output.
Presumably if they request with Accept: application/vnd.ipld.dag-cbor, application/vnd.ipld.raw
(e.g. both types) we'll try to decode and if it fails, fall back to letting them use .arrayBuffer()
.
That said, what if we just try to decode and just let the (familiar) content type of the response dictate how it can be consumed, to save the user making two requests and so they don't have to care about application/vnd.ipld.whatnow
?
If the block decodes as JSON-compliant DAG-CBOR it's application/json
, otherwise it's application/octet-stream
.
import { verifiedFetch } from '@helia/verified-fetch'
import * as dagCbor from '@ipld/dag-cbor'
const response = await verifiedFetch('ipfs://bafyDagCbor')
let obj
if (response.headers.get('Content-Type')?.includes('application/json')) {
obj = await response.json()
} else {
obj = dagCbor.decode(await response.arrayBuffer())
}
vs
import { verifiedFetch } from '@helia/verified-fetch'
import * as dagCbor from '@ipld/dag-cbor'
let response = await verifiedFetch('ipfs://bafyDagCbor')
let obj
try {
obj = await response.json() // 💥
} catch (err) {
if (!err.message.includes('some JSON error') {
throw err
}
// try again
response = await verifiedFetch('ipfs://bafyDagCbor', {
headers: {
accept: 'application/vnd.ipld.raw'
}
})
try {
obj = dagCbor.decode(await response.arrayBuffer())
} catch (err) {
console.error('giving up', err)
throw err
}
}
I think I prefer the version without exceptions-as-control-flow.
Ok, I think we're in a good place now.
import { verifiedFetch } from '@helia/verified-fetch'
import * as json from 'multiformats/codecs/json'
const res = await verifiedFetch('ipfs://bafyJSONCodec')
// get object representation of JSON body
console.info(await res.json()) // { ... }
// alternative way to do the same thing
const obj = json.decode(new Uint8Array(await res.arrayBuffer()))
console.info(obj) // { ... }
import { verifiedFetch } from '@helia/verified-fetch'
import * as dagJson from '@ipld/dag-json'
const res = await verifiedFetch('ipfs://bafyDAGJSONCodec')
// get spec-compliant plain-old-JSON object that happens to contain a CID
console.info(await res.json()) // { cid: { '/': 'Qmfoo' } }
// ...or use @ipld/dag-json to get rich-object version
const obj = dagJson.decode(new Uint8Array(await res.arrayBuffer()))
console.info(obj) // { cid: CID('Qmfoo') }
import { verifiedFetch } from '@helia/verified-fetch'
import * as dagCbor from '@ipld/dag-cbor'
const res = await verifiedFetch('ipfs://bafyDAGCBORCodec')
if (res.headers.get('content-type') === 'application/json') {
// CBOR was JSON-friendly
console.info(await res.json()) // { ... }
} else {
// CBOR was not JSON-friendly, must use @ipld/dag-cbor to decode
const obj = dagCbor.decode(new Uint8Array(await res.arrayBuffer()))
console.info(obj) // { ... }
}
If the DAG-CBOR
block contains anything that cannot round-trip to JSON (e.g. CID
s, Uint8Array
s, BigInt
s, etc), the content type will be application/octet-stream
- .json()
will throw and so @ipld/dag-cbor
must be used to decode the resolved value of .arrayBuffer()
.
I've added README docs explaining the above, please comment if they muddy the waters.
I opened a separate PR with some small docs refinements https://github.com/ipfs/helia/pull/436
The one challenge here, which isn't a blocker to me, is that for projects that use dag-cbor extensively, they might have some CIDs that can roundtrip to json and some that don't (with BigInts and Uint8Arrays). In its current state, there's no way to opt out of the automatic JSON parsing, so depending on their data, they either have to pay the cost of attempting to parse as json or deal with the difference cases (json/binary) in their code.
In its current state, there's no way to opt out of the automatic JSON parsing
If they know it's going to fail they can add an Accept: application/vnd.ipld.raw
to the request and it'll just be a raw block with the content type application/octet-stream
which they can await res.arrayBuffer()
into @ipld/dag-cbor
?
E.g.:
import { verifiedFetch } from '@helia/verified-fetch'
import * as dagCbor from '@ipld/dag-cbor'
const res = await verifiedFetch('ipfs://bafyDAGCBORCodec', {
headers: {
accept: 'application/vnd.ipld.raw'
}
})
const obj = dagCbor.decode(new Uint8Array(await res.arrayBuffer()))
console.info(obj) // { ... }
If they know it's going to fail they can add an Accept: application/vnd.ipld.raw to the request and it'll just be a raw block with the content type application/octet-stream which they can await res.arrayBuffer() into @ipld/dag-cbor?
From a look at the code, this isn't implemented and the example you provided returns a response object with status 501.
Another thing which might be a bug is that we pass the options object to a method which expects the body as the first parameter:
I also tried the example and it doesn't work
I'm going to merge this so we can proceed with the tests for the different data types in the examples.
Description
DAG-JSON and DAG-CBOR can have embeded CIDs which are deserialized to CID objects by the
@ipld/dag-json
and@ipld/dag-cbor
modules.This fixes a bug whereby we were JSON.stringifying them which lead to rubbish being returned from
.json()
.Notes & open questions
This overwrites the
.json
method on the returnedResponse
object. Is that bad?Change checklist