Right now if you train a model on GPU, save it with MLEM, but then try to load/serve it on CPU, it simply breaks.
The only workaround that exists now is to convert the model to CPU before saving it.
We need to make this work:
load the model to CPU if GPU is not available
make an option to specify the device model should be loaded to
We can check how this is done in other generic tools that save&serve models.
This extends not only to serving model locally, but also to deploying - for example, fly don't have GPUs, so even if you managed to deploy the model, it'll break there.
Vice versa, if the model was trained on CPU, but you want to make it serve it on GPU, MLEM should give a way to do this. Special case would be if you want to load_meta your model (along with pre/post-processors), then you work with MlemModel object (not PyTorch model you can get at load) and you need a way to specify the device to run it on.
Right now if you train a model on GPU,
save
it with MLEM, but then try toload
/serve
it on CPU, it simply breaks. The only workaround that exists now is to convert the model to CPU before saving it. We need to make this work:This extends not only to serving model locally, but also to deploying - for example, fly don't have GPUs, so even if you managed to deploy the model, it'll break there.
Vice versa, if the model was trained on CPU, but you want to make it
serve
it on GPU, MLEM should give a way to do this. Special case would be if you want toload_meta
your model (along with pre/post-processors), then you work withMlemModel
object (not PyTorch model you can get atload
) and you need a way to specify the device to run it on.