pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
83.15k stars 22.42k forks source link

[checkpoint] Switch away from pickle-base serialization #76441

Open kumpera opened 2 years ago

kumpera commented 2 years ago

🚀 The feature, motivation and pitch

We currently use pickle for serializing everything. Beyond efficiency issues, it makes backwards compatibility quite hard.

While have the default implementation pickle based is a reasonable default (it has worked for torch.save) we should make the interface plugable to other implementations.

Alternatives

No response

Additional context

No response

vadimkantorov commented 2 years ago

Related? https://github.com/pytorch/pytorch/issues/52181

kumpera commented 2 years ago

Related? #52181

Yes, somewhat. There are a few issue we face with distributed checkpointing and using torch.load / torch.save beyond those inherent to pickle. In no particular order:

Split tensor metadata from data and save them separately. Have an efficient implementation of partial loading a tensor. IE torch.load(...).view(...)

vadimkantorov commented 2 years ago

Have an efficient implementation of partial loading a tensor.

I wonder if HDF5 allows that... Do you mean torch.load(...).view(...)? or torch.load(...)[some contig slicing expression]?

vadimkantorov commented 2 years ago

Maybe also related: https://github.com/pytorch/pytorch/issues/76924