microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
https://nni.readthedocs.io
MIT License
13.98k stars 1.81k forks source link

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #2918

Closed LeeVinteuil closed 3 years ago

LeeVinteuil commented 3 years ago

input_variable and model have been mapped to cuda,while the RuntimeError still occured.

File "/lib/python3.6/site-packages/nni/compression/torch/quantization/quantizers.py", line 59, in update_ema biased_ema = biased_ema decay + (1 - decay) value RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

this happens in QAT Quantizer

linbinskn commented 3 years ago

Thank you for your asking. Can you post your code here?

linbinskn commented 3 years ago

The correct format of calling QAT should be like the example below.

optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
quantizer = QAT_Quantizer(model, configure_list, optimizer)
quantizer.compress()
model.to(device)

If you map your model to cuda previous to using quantizer.compress(), this error may occured. And when you train your model or do the inference, all of your data should be mapped to cuda first. You can check whether your implementation has problems mentioned above. If not, you can describe your inplementation more specifically so that I can help you find the problem.

LeeVinteuil commented 3 years ago

Thank you very much for both your contribution and answer. My implementation has problems as you mentioned. This problem has been solved.

scarlett2018 commented 3 years ago

closing the issue as the problem has been solved.

un-knight commented 3 years ago

Hi~ @linbinskn , I meet the same problem during using DataParallel, and I found QAT_Quantizer.compress() is inherited from Compressor.compress() which does nothing but just return self.bound_model. So what is meaning by calling quantizer.compress() before model.to('cuda')? And how can I solve the problem of RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!, really thanks if you have any advice.

https://github.com/microsoft/nni/blob/02eab99b9bb280e385a7af71b4c6c4a73bae1f04/nni/compression/pytorch/compressor.py#L118-L130

trilever commented 2 months ago

Maybe, when you are building model, you can set device_map like:

def build_model(pretrained_model_name_or_path: str, task_name: str):
    is_regression = task_name == 'stsb'
    num_labels = 1 if is_regression else (3 if task_name == 'mnli' else 2)
    model = BertForSequenceClassification.from_pretrained(pretrained_model_name_or_path, num_labels=num_labels,device_map="cuda:1")
    return model
image