jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
https://jcristharif.com/msgspec/
BSD 3-Clause "New" or "Revised" License
2.01k stars 59 forks source link

Capture validation errors without failing container schema #665

Open bentheiii opened 2 months ago

bentheiii commented 2 months ago

It would be useful to be able to parse a list of input, while knowing some of them will not conform to a schema, for example, given a schema:

class S(Struct):
    a: int
    b: str

and the input [{"a": 1, "b": "foo"}, {"c": 2}, {"a": 3, "b": "bar"}], we want to be able to extract the first and third elements, even though the second is invalid. A bonus would be to be able to extract the errors that occurred. my current solution for this is as follows:

class S(Struct):
    a: int
    b: str

@dataclass
class Failure:
    error: Exception
    builtins: Any

class Maybe(ABC, Generic[T]):
    @classmethod
    def __subclasscheck__(cls, subclass): 
        return True

gen_alias = type(Maybe[int])

def dec_hook(type_, obj):
    if isinstance(type_, gen_alias) and type_.__origin__ is Maybe:
        try:
            return convert(obj, type_.__args__[0])
        except Exception as e:
            return Failure(error=e, builtins=obj)
    else:
        raise TypeError(f"Unknown type: {type_}")

d_invalid = Decoder(list[Maybe[S]], dec_hook=dec_hook)

raw_invalid = b'[{"a": 1, "b": "foo"}, {"c": 3}, {"a": 3, "b": "bar"}]'

lst = d_invalid.decode(raw_invalid)

for i, x in enumerate(lst):
    print(f"{i}: {x!r}")

# 0: S(a=1, b='foo')
# 1: Failure(error=ValidationError('Object missing required field `a`'), builtins={'c': 2})
# 2: S(a=3, b='bar')

however, this solution is both cumbersome and is between x4 and x10 slower than parsing a list[S], maybe a lower-level solution will be faster