rmihaylov / mpttune

Tune MPTs
Apache License 2.0
84 stars 16 forks source link

Update model.py for enabling multi-GPU training #8

Closed FykAikawa closed 1 year ago

FykAikawa commented 1 year ago

With multiple GPU (my case is with A6000x2), the library gives error, shown below. File "/usr/local/lib/python3.10/site-packages/mpttune-0.1.0-py3.10.egg/mpttune/model/mpt/model.py", line 877, in forward logits = F.linear(hidden_states, self.transformer.wte.weight) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

To avoid this error, I changed a little lines of the code following LLM-Foundry's commit(https://github.com/mosaicml/llm-foundry/commit/9c89ab263e72fb9610f28c8ab9cde5d2205b6bff). I would like you to check, test, and merge.

UmarJawad commented 1 year ago

@FykAikawa I tried your solution and it works however I switched the backend to "cuda" as triton still throws out an error.

rmihaylov commented 1 year ago

Looks good.