Closed frankdarkluo closed 7 months ago
Hi! Thanks for reporting this, indeed it seems that the usage with multi-gpu is not working if quantization is not specified due to a bad device casting. Could you try to checkout the branch of PR #264 to see if that works for you?
Hi! Thanks for reporting this, indeed it seems that the usage with multi-gpu is not working if quantization is not specified due to a bad device casting. Could you try to checkout the branch of PR #264 to see if that works for you?
Thanks! I tried out the branch of PR #264 using the same code above, and I get this error now:
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu. ⠋ Loading model with attention method...WARNING:accelerate.big_modeling:You shouldn't move a model that is dispatched using accelerate hooks. Traceback (most recent call last): File "/home/gluo/inseq/examples/main.py", line 90, in
qa_model = inseq.load_model(model, "attention", tokenizer=model_name, tokenizer_kwargs={"legacy": False}) File "/opt/anaconda3/envs/tuned-lens/lib/python3.9/site-packages/inseq/models/init.py", line 47, in load_model return FRAMEWORKS_MAP[framework].load(model, attribution_method, kwargs) File "/opt/anaconda3/envs/tuned-lens/lib/python3.9/site-packages/inseq/models/huggingface_model.py", line 154, in load return HuggingfaceDecoderOnlyModel( File "/opt/anaconda3/envs/tuned-lens/lib/python3.9/site-packages/inseq/models/huggingface_model.py", line 482, in init super().init(model, attribution_method, tokenizer, device, model_kwargs, tokenizer_kwargs, kwargs) File "/opt/anaconda3/envs/tuned-lens/lib/python3.9/site-packages/inseq/models/huggingface_model.py", line 132, in init self.setup(device, attribution_method, **kwargs) File "/opt/anaconda3/envs/tuned-lens/lib/python3.9/site-packages/inseq/models/attribution_model.py", line 241, in setup self.device = device if device is not None else get_default_device() File "/opt/anaconda3/envs/tuned-lens/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in setattr super().setattr(name, value) File "/opt/anaconda3/envs/tuned-lens/lib/python3.9/site-packages/inseq/models/huggingface_model.py", line 169, in device self.model.to(self._device) File "/opt/anaconda3/envs/tuned-lens/lib/python3.9/site-packages/accelerate/big_modeling.py", line 453, in wrapper raise RuntimeError("You can't move a model that has some modules offloaded to cpu or disk.")
I think the problem is caused by self.model.to(self._device)
?
Is it possible that something should be added here? https://github.com/inseq-team/inseq/blob/fix-device-map-multigpu/inseq/models/huggingface_model.py#L170--L172
Hey @frankdarkluo, you're right, the setter is actually the one in the HuggingfaceModel
class. I applied a fix that should prevent the move to GPU operation in case a device map is specified, could you try it again?
Thanks! I think it is working now!
Question
When I load inseq onto a LLM that is loaded onto two GPUs shown below
Then I got the OOM error as
How can I solve this problem?
Is there any tutorial or link for possible solution?
Checklist
issues
.