ipld / js-dag-cbor

dag-cbor codec for IPLD
Other
27 stars 17 forks source link

can we add option to strip undefined instead of erroring #57

Open Gozala opened 2 years ago

Gozala commented 2 years ago

I've been working on UCAN based RPC system that encodes request / responses into set of CBOR blocks in a CAR. Unfortunately it proved challenging to not encounter errors from CBOR codec due to some field set to undefined or due to some built-in data type like Error been referenced.

Could we add an option to make CBOR codec behave more or less like JSON.stringify does, specifically

  1. Just leave out undefined values
  2. Don't complain about Error or other built-in instances, instead just include own enumerable properties
rvagg commented 2 years ago

The problem with being flexible with undefined is that the codecs are attempting to be as symmetrical as possible, and that goes for silently ignoring fields. What goes in should be what comes out. The more scope for mismatching input:output the more scope for bugs appearing when round-trips get involved; we keep on seeing these problems in practice.

But, that's obviously not as clean when we hit the real world, and we have a couple of instances of sloppy behaviour:

So there's two approaches we could take to alleviate the pain:

  1. Provide a utility to "clean" or prepare() an object, to get the symmetry, that would be a nice ideal solution; but it's going to have perf costs of course.
  2. Provide an opt-in encode somehow - either a separate exported method, or even a full exported BlockCodec, or some kind of configuration option that would let you change the way the encoder works. Then we could push the object walking right down into the encoder and switch according to options. We already have a special option for undefined and could extend that to do something else with it. The own properties thing will be a bit more difficult but I can imagine it also being achieved with a config option.

How important is this? Is this simply a quality of life feature, or is it getting in the way and being a problem for downstream users? I could do some work on the CBOR encoder to expose options to do this but I don't know how to prioritise this and I have other things on my plate vying for attention.

Gozala commented 2 years ago

How important is this? Is this simply a quality of life feature, or is it getting in the way and being a problem for downstream users?

It is pretty problematic at least in the context of UCAN based RPC system because service implementation:

  1. Can provide endpoints that return arbitrary things that may have undefined somewhere and/or Error instance. It would be really difficult to enforce that handlers never include such values and I fear it is something that would show up in the production as opposed to during development.
    • I would much rather e.g. log warning on server when such values are sent as opposed to crash a service or even respond with error.
    • RPC runtime is similar to GraphQL in a sense that it takes arbitrary batch of invocations and produces batch result. What that implies that e.g. if 100 invocations were passed and only one of them produces value with undefined somewhere encoding fails and general error occurs as opposed to 99 successes and one error. That is really bad, if only one invocation failed and rest succeeded that would have been not as bad.
  2. Error on encode is particularly bad in RPC system, as request may have successfully mutated some state and failing to encode response could lead to client retrying which in turn may cause some other problems due to changed state.

At the moment I'm working around this by going over the returned values and:

  1. Removing undefineds.
  2. Extracting fields from Error instance into a plain object.

This is pretty unfortunate, because same data structure gets traversed once before handing it off to CBOR encoder which then traverses it again to actually encode.

Gozala commented 2 years ago

In terms of ideal solution it would something along the lines of:

  1. Ideally CBOR.encode would take optional replacer argument equivalent to one you can pass to JSON.stringify
    • It would allow normalization / encode to occur via single traverse
    • Would rid me off the code that needs to do traversal.
  2. As you have also pointed out replacer would affect performance, but if not present current code path could be used.
  3. Would be nice to have possibly another CBOR.loose.encode or possibly other module all together that behaves similar to JSON.stringify, that is has optimized code path (without replacer) and is able to leave out undefineds and avoid throwing on Error instances.
Gozala commented 2 years ago

The problem with being flexible with undefined is that the codecs are attempting to be as symmetrical as possible, and that goes for silently ignoring fields. What goes in should be what comes out. The more scope for mismatching input:output the more scope for bugs appearing when round-trips get involved; we keep on seeing these problems in practice.

One could argue that leaving out undefined meets that criteria CBOR.decode(CBOR.encode({ omit: undefined })).omit === undefined is true even if Object.keys(input) would be different.

While I generally prefer early errors, I'm still finding that JSON.stringify strikes a better balance here. It is hard to force no undefineds if you're interfacing with user space code.

This input:output argument makes no sense to me in regards to Error instances however because it is not enforced on any other class instances. So new class Point { constructor(x, y) { this.x = x; this.y = y }(0, 0) would produce {x, y} but new Error('boom') would fail to encode.

You could probably start rejecting any class instances too, but I'm not sure if that would be an improvement.

RangerMauve commented 1 year ago

+1 to a "cbor.prepare" or similar, or maybe a flag to do the sanitization.