When sharing models, we currently need to share both dataset and checkpoint. For prediciton, the dataset is solely used to obtain a mapping between entity and relation indexes and their ids or mentions, however.
A better approach may be to support "packaged models", where a package contains the checkpoint and just the relevant part of the dataset (which is much smaller than the entire dataset). With this, models can be deployed right away without having to have the dataset around.
The packaged model should not contain the full checkpoint, I think. For example, the optimizer state can be removed, which may make packages significantly smaller than checkpoints.
When sharing models, we currently need to share both dataset and checkpoint. For prediciton, the dataset is solely used to obtain a mapping between entity and relation indexes and their ids or mentions, however.
A better approach may be to support "packaged models", where a package contains the checkpoint and just the relevant part of the dataset (which is much smaller than the entire dataset). With this, models can be deployed right away without having to have the dataset around.