radiantearth / geo-ml-model-catalog

Geospatial ML Model Catalog Spec
Apache License 2.0
52 stars 8 forks source link

Support for PyTorch checkpoints as serialization format #26

Open duckontheweb opened 3 years ago

duckontheweb commented 3 years ago

Problem Description

The spec currently supports only ONNX format for serializing models, but many model developers may be using PyTorch checkpoint files to save their files already. Supporting PyTorch checkpoint files might give consumers of the model who are also using PyTorch an easier path to running the model for generating predictions, or for loading the model to re-train.

Proposal

This issue exists to discuss whether we should add PyTorch checkpoint files as a supported serialization format in the Runtime fragment, and what changes would need to be made to that section to accommodate this addition.

duckontheweb commented 3 years ago

cc: @sfoucher

@calebrob6 I believe you had voiced a concern early on that PyTorch checkpoints might not capture enough information to be universally useful; maybe you can weigh in on this issue as well.

ymoisan commented 3 years ago

About "concern early on that PyTorch checkpoints might not capture enough information to be universally useful" we may be interested in supporting Model ARchives (.mar) instead of PyTorch checkpoints. Torchserve's first step is to "... convert model data from PyTorch data format (.pth) file to model archive format (.mar) file." Looks like a lot of stuff can be crammed in a MAR file.