Speed up results serialization

Describe a requested feature

I was running some performance tests and I noticed that checking if an object is pickable: https://github.com/tunib-ai/parallelformers/blob/ccaea515ee2e4d7540f2a275f6cdb0c33a7780f0/parallelformers/parallel/process.py#L209 takes a lot of time when the output is big (f.e., when a model returns a large logits tensor), because the whole object is being serialized into memory and then deserialized. I wonder what are the cases in which check_pickable helps, as dataclasses and ModelOutput should be as pickable as its dictionary representation.

If the check is still needed, I guess the code could be still sped up by modifying an object only on pickle failure. That would require some workarounds (perhaps overriding https://github.com/python/cpython/blob/9dc787ea96916552695e79397588fdfa68f22024/Lib/multiprocessing/queues.py#L275) so I want to make sure the check is still necessary, before giving it a shot. Another option is to always check for https://github.com/tunib-ai/parallelformers/blob/ccaea515ee2e4d7540f2a275f6cdb0c33a7780f0/parallelformers/parallel/process.py#L236-L239 and modify the object even if it's pickable, but that would remove custom fields added outside a definition of a given class.

tunib-ai / parallelformers

Speed up results serialization #46

Describe a requested feature