CPU Inference After GPU Training

aldrinc commented 5 years ago

Hi - I trained an Implicit Sequence Model and loaded it in my Flask API for serving locally on my machine and I cannot seem to get CPU inference working.

The model works correctly when a GPU is available.

Steps to recreate:

Run flask server locally e.g. model = torch.load('./my_model_v0.13.pt', map_location='cpu')`
Post a JSON payload with sequence values. I've already tested that the server can correctly parse the response.
Server error when model attempts to predict preds = model.predict(arr)

RuntimeError: torch.cuda.LongTensor is not enabled. More trace below.

Traceback (most recent call last):
  File "/Users/aldrinclement/anaconda/lib/python2.7/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/aldrinclement/anaconda/lib/python2.7/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/aldrinclement/anaconda/lib/python2.7/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/aldrinclement/anaconda/lib/python2.7/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/aldrinclement/anaconda/lib/python2.7/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "main.py", line 77, in predict
    preds = model.predict(arr)
  File "/Users/aldrinclement/anaconda/lib/python2.7/site-packages/spotlight/sequence/implicit.py", line 323, in predict
    sequence_var = gpu(sequences, self._use_cuda)
  File "/Users/aldrinclement/anaconda/lib/python2.7/site-packages/spotlight/torch_utils.py", line 9, in gpu
    return tensor.cuda()
RuntimeError: torch.cuda.LongTensor is not enabled.

def load_model():
    """Load the pre-trained model, you can use your model just as easily."""
    global model
    model = torch.load('./justlook_v0.13.pt', map_location='cpu')

angelleng commented 5 years ago

You need to also turn the flag model._use_cuda off. Otherwise the input will be converted to cuda tensors: sequence_var = gpu(sequences, self._use_cuda)

maciejkula commented 5 years ago

That's correct. There really should be a better way of doing this but I'm short on time and GPU testing runs.

maciejkula / spotlight

CPU Inference After GPU Training #159