I encountered an error when trying to use a trained model for prediction. The model was initially trained using a GPU, but when attempting to run predictions using the same model on a CPU, an error occurs. The error message indicates that Torch was not compiled with CUDA enabled, despite the prediction being run on a CPU.
Additionally, I used Machamp for a multilabel classification task.
Traceback (most recent call last):
File "/Users/redacted/Code/machamp/predict.py", line 55, in <module>
predict_with_paths(model, input_path, output_path, args.dataset, args.batch_size, args.raw_text, device, args.conn, args.sep, args.threshold)
File "/Users/redacted/Code/machamp/machamp/predictor/predict.py", line 228, in predict_with_paths
write_pred(out_file, batch, device, dev_dataset, model, data_config[dataset], raw_text, conn, sep)
File "/Users/redacted/Code/machamp/machamp/predictor/predict.py", line 194, in write_pred
out_dict = model.get_output_labels(enc_batch['token_ids'], enc_batch['golds'], enc_batch['seg_ids'],
File "/Users/redacted/Code/machamp/machamp/model/machamp.py", line 488, in get_output_labels
out_dict[task] = self.decoders[task].get_output_labels(mlm_out_task, task_word_mask, golds_task)
File "/Users/redacted/Code/machamp/machamp/model/multiclas_decoder.py", line 49, in get_output_labels
logits = self.forward(mlm_out, mask, gold)['logits']
File "/Users/redacted/Code/machamp/machamp/model/multiclas_decoder.py", line 39, in forward
self.metric.score(preds[:,1:], gold.eq(torch.tensor(1.0, device=self.device))[:, 1:], None)
File "/usr/local/Caskroom/miniconda/base/envs/machamp/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
Environment
Training Environment
Google Colab Pro, Python 3.10.12
Prediction Environment
macOS Monterey 12.6, Python 3.10.12
Possible Cause
MachampModel class has an MLM and multiple decoders. During prediction, while the MLM itself has been moved to CPU, the encoders have not. Thus, while trying to predict, the decoders assume that they're still running on GPU, causing the mentioned error.
Currently, I fixed this issue locally by modifying get_output_labels function on MachampModel:
...
# move decoder to CPU before prediction
self.decoders[task].device = self.device
out_dict[task] = self.decoders[task].get_output_labels(mlm_out_task, task_word_mask, golds_task)
...
Thanks for the detailed report and fix!, the fix seems accurate and at the right location. Could you submit a pull request?, otherwise I will merge it into the next update
Description
I encountered an error when trying to use a trained model for prediction. The model was initially trained using a GPU, but when attempting to run predictions using the same model on a CPU, an error occurs. The error message indicates that Torch was not compiled with CUDA enabled, despite the prediction being run on a CPU.
Additionally, I used Machamp for a multilabel classification task.
Steps to Reproduce
Expected Behavior
The model should be able to predict as normal.
Actual Behavior
Command I used to predict:
Machamp returns the following error:
Environment
Training Environment
Google Colab Pro, Python 3.10.12
Prediction Environment
macOS Monterey 12.6, Python 3.10.12
Possible Cause
MachampModel class has an MLM and multiple decoders. During prediction, while the MLM itself has been moved to CPU, the encoders have not. Thus, while trying to predict, the decoders assume that they're still running on GPU, causing the mentioned error.
Currently, I fixed this issue locally by modifying
get_output_labels
function on MachampModel: