Ability to filter out incorrect items during array decoding

lusarz commented 1 year ago

I am currently using your library for type-safe decoding of API responses, which sometimes contain arrays of items. In its current implementation, the D.array decoder will throw an error if any of the items in the array fail to decode against the provided schema, like so:

export const item: D.Decoder<Item> = D.object({
  id: D.uuidv4,
  title: D.string,
});

export const paginatedItems: D.Decoder<PaginatedResponse<Item>> = D.object({
  results: D.array(item),
  next_page: D.optional(D.nullable(D.string)),
});

I'd like to propose a feature where the array decoder has a "resilient" or "filter" mode that continues decoding despite individual errors, collects these errors separately, and omits the failing items from the resulting array. This could provide a more user-friendly, error-resilient approach while still being within the bounds of the library's design philosophy.

nvie commented 1 year ago

Hi @lusarz — thanks for reaching out and opening this issue!

The decoder standard library is supposed to be very generic, and I'm not sure if this behavior is generic enough to warrent a new API (or a configuration option). As soon as you enter that territory, another—slightly different—use case can/will pop up, which needs more configuration.

I would instead recommend you generically solve this, but within your own application. In this case, I would recommend writing a generic "forgiving array" helper decoder, like so:

// Put this in your own library of decoders
function forgivingArray<T>(decoder: Decoder<T>): Decoder<T[]> {
  const sentinel = Symbol();
  return D.array(D.either(decoder, D.always(sentinel)))
    .transform((arr) => arr.filter((x): x is T => x !== sentinel));
}

Then, you can use it as a drop-in replacement for D.array, like so:

export const paginatedItems: D.Decoder<PaginatedResponse<Item>> = D.object({
  results: forgivingArray(item),  // 👈
  next_page: D.optional(D.nullable(D.string)),
});

Hope this helps!

lusarz commented 1 year ago

Hi @nvie - thank you for your quick response and for the suggestion of a "forgiving array" helper decoder. I have implemented it and it does help in handling the decoding failures.

However, I have a requirement to log the original value of an item whenever it fails to decode. With the current suggestion, I'm unable to access the original failing value, as the sentinel doesn't carry this information.

Would you have any suggestions or recommendations on how I can modify the "forgiving array" decoder to both filter out decoding errors and also provide access to the original, undecoded value of the items that failed decoding? This is necessary for my error logging and debugging process.

Thanks again for your guidance and assistance.

nvie commented 1 year ago

The simplest solution probably is to call a callback if a value cannot be decoded and gets replaced by the sentinel:

const sentinel = Symbol();

function tryDecoder<T>(decoder: Decoder<T>, callback: (value: unknown) => void) {
  return either(decoder, D.unknown.transform(rejectedValue => {
    callback(rejectedValue);
    return sentinel;
  })
}

function forgivingArray<T>(decoder: Decoder<T>, callback: (value: unknown) => void): Decoder<T[]> {
  return D.array(tryDecoder(decoder, callback))
    .transform((arr) => arr.filter((x): x is T => x !== sentinel));
}

You can then use forgivingArray(item, (huh) => console.log(`Huh? ${huh}`)) to see all the values that are getting rejected (or do with them whatever you wish). Does that approach work?

lusarz commented 1 year ago

Thank you for this - it works for me! :pray:

I still encourage you to consider extending the library API.

nvie commented 1 year ago

I think I do want to offer a decoder for this in the standard library after all. Just need to think of the right API and name for it.

Roughly, it could look something like this:

forgivingArray(
  itemDecoder,
  (skipped: unknown[]) => void,
)

It’s like array(), but does a best-effort attempt at decoding the items. This means the array itself will never reject. Items that are rejected will be collected and reported together, in a single callback (so slightly differently than I suggested you before).

A few open questions:

🤔 The name for this new decoder. It should be short, generic, and not too generic.
🤔 Is it important to retain the position/index of each failure in the input? So should the callback look like:
1. (skippedItems: unknown[]) => void, or
2. (skippedItems: [unknown, number][]) => void, or
3. (skippedItems: unknown[], positions: number[]) => void

I think the last one is most pragmatic for the majority of use cases, and still offers all the rich positional info for people that need to retain all of it.

lusarz commented 1 year ago

In the implementation I did in my project I've used tolerantArray as a name and callback for every rejected item instead of skippedItems array:

tolerantArray(
  itemDecoder,
  (rejectedItem: unknown) => void,
)

In my case position/index wasn't relevant.

nvie / decoders

Ability to filter out incorrect items during array decoding #976