machamp-nlp / machamp

Repository with code for MaChAmp: https://aclanthology.org/2021.eacl-demos.22/
Apache License 2.0
80 stars 21 forks source link

Model trained on GPU cannot be used to predict on CPU #37

Open ShandyDrm opened 1 month ago

ShandyDrm commented 1 month ago

Description

I encountered an error when trying to use a trained model for prediction. The model was initially trained using a GPU, but when attempting to run predictions using the same model on a CPU, an error occurs. The error message indicates that Torch was not compiled with CUDA enabled, despite the prediction being run on a CPU.

Additionally, I used Machamp for a multilabel classification task.

Steps to Reproduce

  1. Train a model by using GPU
  2. Predict using earlier model, in CPU

Expected Behavior

The model should be able to predict as normal.

Actual Behavior

Command I used to predict:

python3 predict.py models/model_name.pt data/test.tsv predictions/prediction.out --device -1

Machamp returns the following error:

Traceback (most recent call last):
  File "/Users/redacted/Code/machamp/predict.py", line 55, in <module>
    predict_with_paths(model, input_path, output_path, args.dataset, args.batch_size, args.raw_text, device, args.conn, args.sep, args.threshold)
  File "/Users/redacted/Code/machamp/machamp/predictor/predict.py", line 228, in predict_with_paths
    write_pred(out_file, batch, device, dev_dataset, model, data_config[dataset], raw_text, conn, sep)
  File "/Users/redacted/Code/machamp/machamp/predictor/predict.py", line 194, in write_pred
    out_dict = model.get_output_labels(enc_batch['token_ids'], enc_batch['golds'], enc_batch['seg_ids'],
  File "/Users/redacted/Code/machamp/machamp/model/machamp.py", line 488, in get_output_labels
    out_dict[task] = self.decoders[task].get_output_labels(mlm_out_task, task_word_mask, golds_task)
  File "/Users/redacted/Code/machamp/machamp/model/multiclas_decoder.py", line 49, in get_output_labels
    logits = self.forward(mlm_out, mask, gold)['logits']
  File "/Users/redacted/Code/machamp/machamp/model/multiclas_decoder.py", line 39, in forward
    self.metric.score(preds[:,1:], gold.eq(torch.tensor(1.0, device=self.device))[:, 1:], None)
  File "/usr/local/Caskroom/miniconda/base/envs/machamp/lib/python3.10/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Environment

Training Environment

Google Colab Pro, Python 3.10.12

Prediction Environment

macOS Monterey 12.6, Python 3.10.12

Possible Cause

MachampModel class has an MLM and multiple decoders. During prediction, while the MLM itself has been moved to CPU, the encoders have not. Thus, while trying to predict, the decoders assume that they're still running on GPU, causing the mentioned error.

Currently, I fixed this issue locally by modifying get_output_labels function on MachampModel:

...
# move decoder to CPU before prediction
self.decoders[task].device = self.device

out_dict[task] = self.decoders[task].get_output_labels(mlm_out_task, task_word_mask, golds_task)
...
robvanderg commented 1 month ago

Thanks for the detailed report and fix!, the fix seems accurate and at the right location. Could you submit a pull request?, otherwise I will merge it into the next update