radical-cybertools / radical.utils

Utility classes and tools for various radical projects
Other
8 stars 6 forks source link

Serialization using `msgpack` for `TypedDict` #336

Closed mtitov closed 2 years ago

mtitov commented 2 years ago

msgpack.packb can serialize object if it is of a known type OR if a corresponding function to convert object to the known type is provided, thus we have two options (I would like to ask to consider the 2nd one)

  1. Make TypedDict inherited from dict, because msgpack package uses CPython, which is explicit with dict as a base class, and hook with __class__ doesn't work there (link, PyDict_Check checks object type and sub-types, and PyDict_CheckExact allows object of dict type only)
  2. Use option default in msgpack.packb call (link), which tells how to convert our object into the known type (that function will be called only if object's type wouldn't recognized)
    from ..typeddict import to_dict
    ...
    bmsg = msgpack.packb(msg, default=to_dict)

    That change is needed in ru.Publisher.put (not sure about ru.Putter)

p.s. current issue affects RP run (CI analytics-check run), see RP PR #2523

andre-merzky commented 2 years ago

Ha, that was probably the problem I stumbled over back then when using __class__ :-P Well, I am glad it popped up rather sooner than later...

But really, we should not bend over backward to avoid using the inheritance of dict. It's the easy way and it's what we know works, and it keeps the code simpler - so please let's just go for it. So far the only reason I can see for avoiding the inheritance is that we can avoid it - which is not a sufficient reason in itself...

BTW: using the default= approach to msgpack.packb has about 10% performance overhead. Not much, but still, no point in paying it:

$ ./typed_dict_perf.py 
create :    1048576:       68.7sec  2842998392 bytes
default:    1048576:       15.2sec  2842998392 bytes -  588559992 bytes
inherit:    1048576:       14.9sec  2842998392 bytes -  588559992 bytes