jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
https://jcristharif.com/msgspec/
BSD 3-Clause "New" or "Revised" License
2.45k stars 76 forks source link

Convert builtin types to numpy #655

Open i-newton opened 8 months ago

i-newton commented 8 months ago

Question

I would like to convert some built-in types to numpy dtype instances in order to save some memory, numpy fields are part of msgspec struct. Is it possible to do in msgspec? Should i use dec_hook here, i failed to find any example fot the issue.

jcrist commented 8 months ago

Hi, thanks for opening this. I'm not sure I understand the question - can you provide an example (pseudocode) showing what you would like to happen? I can then (hopefully) show you how to make that example work with msgspec.

makarr commented 4 months ago

This might help

NP_NDARRAY_CODE = 1

class NumpyStruct(msgspec.Struct):
    arr: np.ndarray

def enc_hook(obj: Any) -> Any:
    if isinstance(obj, np.ndarray):
        f = io.BytesIO()
        np.save(f, obj)
        data = f.getvalue()
        return msgspec.msgpack.Ext(NP_NDARRAY_CODE, data)
    else:
        raise NotImplementedError(f"Objects of type {type(obj)} are not supported")

def ext_hook(code: int, data: memoryview) -> Any:
    if code == NP_NDARRAY_CODE:
        return np.load(io.BytesIO(data))
    else:
        raise NotImplementedError(f"Extension type code {code} is not supported")

enc = msgspec.msgpack.Encoder(enc_hook=enc_hook)
dec = msgspec.msgpack.Decoder(NumpyStruct, ext_hook=ext_hook)

s1 = NumpyStruct(arr=np.random.rand(8))

msg = enc.encode(s1)
s2 = dec.decode(msg)

np.allclose(s1.arr, s2.arr) # True